Automatic prefetching in querysets

classic Classic list List threaded Threaded
49 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Automatic prefetching in querysets

tolomea
I'd like to discuss automatic prefetching in querysets. Specifically automatically doing prefetch_related where needed without the user having to request it.

For context consider these three snippets using the Question & Choice models from the tutorial when there are 100 questions each with 5 choices for a total of 500 choices.

Default
for choice in Choice.objects.all():
   
print(choice.question.question_text, ':', choice.choice_text)
501 db queries, fetches 500 choice rows and 500 question rows from the DB

Prefetch_related
for choice in Choice.objects.prefetch_related('question'):
   
print(choice.question.question_text, ':', choice.choice_text)
2 db queries, fetches 500 choice rows and 100 question rows from the DB

Select_related
for choice in Choice.objects.select_related('question'):
   
print(choice.question.question_text, ':', choice.choice_text)
1 db query, fetches 500 choice rows and 500 question rows from the DB

I've included select_related for completeness, I'm not going to propose changing anything about it's use. There are places where it is the best choice and in those places it will still be up to the user to request it. I will note that anywhere select_related is optimal prefetch_related is still better than the default and leave it at that.

The 'Default' example above is a classic example of the N+1 query problem, a problem that is widespread in Django apps.
This pattern of queries is what new users produce because they don't know enough about the database and / or ORM to do otherwise.
Experieced users will also often produce this because it's not always obvious what fields will and won't be used and subsequently what should be prefetched.
Additionally that list will change over time. A small change to a template to display an extra field can result in a denial of service on your DB due to a missing prefetch.
Identifying missing prefetches is fiddly, time consuming and error prone. Tools like django-perf-rec (which I was involved in creating) and nplusone exist in part to flag missing prefetches introduced by changed code.
Finally libraries like Django Rest Framework and the Admin will also produce queries like this because it's very difficult for them to know what needs prefetching without being explicitly told by an experienced user.

As hinted at the top I'd like to propose changing Django so the default code behaves like the prefetch_related code.
Longer term I think this should be the default behaviour but obviously it needs to be proved first so for now I'd suggest a new queryset function that enables this behaviour.

I have a proof of concept of this mechanism that I've used successfully in production. I'm not posting it yet because I'd like to focus on desired behavior rather than implementation details. But in summary, what it does is when accessing a missing field on a model, rather than fetching it just for that instance, it runs a prefetch_related query to fetch it for all peer instances that were fetched in the same queryset. So in the example above it prefetches all Questions in one query.

This might seem like a risky thing to do but I'd argue that it really isn't.
The only time this isn't superior to the default case is when you are post filtering the queryset results in Python.
Even in that case it's only inferior if you started with a large number of results, filtered basically all of them and the code is structured so that the filtered ones aren't garbage collected.
To cover this rare case the automatic prefetching can easily be disabled on a per queryset or per object basis. Leaving us with a rare downside that can easily be manually resolved in exchange for a significant general improvement.

In practice this thing is almost magical to work with. Unless you already have extensive and tightly maintained prefetches everywhere you get an immediate boost to virtually everything that touches the database, often knocking orders of magnitude off page load times.

If an agreement can be reached on pursuing this then I'm happy to put in the work to productize the proof of concept.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/d402bf30-a5af-4072-8b50-85e921f7f9af%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Automatic prefetching in querysets

Marc Tamlyn
Hi Gordon,

Thanks for the suggestion.

I'm not a fan of adding a layer that tries to be this clever. How would possible prefetches be identified? What happens when an initial loop in a view requires one prefetch, but a subsequent loop in a template requires some other prefetch? What about nested loops resulting in nested prefetches? Code like this is almost guaranteed to break unexpectedly in multiple ways. Personally, I would argue that correctly setting up and maintaining appropriate prefetches and selects is a necessary part of working with an ORM.

Do you know of any other ORMs which attempt similar magical optimisations? How do they go about identifying the cases where it is necessary?

On 15 August 2017 at 10:44, Gordon Wrigley <[hidden email]> wrote:
I'd like to discuss automatic prefetching in querysets. Specifically automatically doing prefetch_related where needed without the user having to request it.

For context consider these three snippets using the Question & Choice models from the tutorial when there are 100 questions each with 5 choices for a total of 500 choices.

Default
for choice in Choice.objects.all():
   
print(choice.question.question_text, ':', choice.choice_text)
501 db queries, fetches 500 choice rows and 500 question rows from the DB

Prefetch_related
for choice in Choice.objects.prefetch_related('question'):
   
print(choice.question.question_text, ':', choice.choice_text)
2 db queries, fetches 500 choice rows and 100 question rows from the DB

Select_related
for choice in Choice.objects.select_related('question'):
   
print(choice.question.question_text, ':', choice.choice_text)
1 db query, fetches 500 choice rows and 500 question rows from the DB

I've included select_related for completeness, I'm not going to propose changing anything about it's use. There are places where it is the best choice and in those places it will still be up to the user to request it. I will note that anywhere select_related is optimal prefetch_related is still better than the default and leave it at that.

The 'Default' example above is a classic example of the N+1 query problem, a problem that is widespread in Django apps.
This pattern of queries is what new users produce because they don't know enough about the database and / or ORM to do otherwise.
Experieced users will also often produce this because it's not always obvious what fields will and won't be used and subsequently what should be prefetched.
Additionally that list will change over time. A small change to a template to display an extra field can result in a denial of service on your DB due to a missing prefetch.
Identifying missing prefetches is fiddly, time consuming and error prone. Tools like django-perf-rec (which I was involved in creating) and nplusone exist in part to flag missing prefetches introduced by changed code.
Finally libraries like Django Rest Framework and the Admin will also produce queries like this because it's very difficult for them to know what needs prefetching without being explicitly told by an experienced user.

As hinted at the top I'd like to propose changing Django so the default code behaves like the prefetch_related code.
Longer term I think this should be the default behaviour but obviously it needs to be proved first so for now I'd suggest a new queryset function that enables this behaviour.

I have a proof of concept of this mechanism that I've used successfully in production. I'm not posting it yet because I'd like to focus on desired behavior rather than implementation details. But in summary, what it does is when accessing a missing field on a model, rather than fetching it just for that instance, it runs a prefetch_related query to fetch it for all peer instances that were fetched in the same queryset. So in the example above it prefetches all Questions in one query.

This might seem like a risky thing to do but I'd argue that it really isn't.
The only time this isn't superior to the default case is when you are post filtering the queryset results in Python.
Even in that case it's only inferior if you started with a large number of results, filtered basically all of them and the code is structured so that the filtered ones aren't garbage collected.
To cover this rare case the automatic prefetching can easily be disabled on a per queryset or per object basis. Leaving us with a rare downside that can easily be manually resolved in exchange for a significant general improvement.

In practice this thing is almost magical to work with. Unless you already have extensive and tightly maintained prefetches everywhere you get an immediate boost to virtually everything that touches the database, often knocking orders of magnitude off page load times.

If an agreement can be reached on pursuing this then I'm happy to put in the work to productize the proof of concept.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/d402bf30-a5af-4072-8b50-85e921f7f9af%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CAMwjO1Gaha-K7KkefJkiS3LRdXvaPPwBeuKmhQv6bJFx3dty3w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Automatic prefetching in querysets

tolomea
Sorry maybe I wasn't clear enough about the proposed mechanism.

Currently when you dereference a foreign key field on an object (so 'choice.question' in the examples above) if it doesn't have the value cached from an earlier access, prefetch_related or select_related then Django will automatically perform a db query to fetch it. After that the value will then be cached on the object for any future dereferences.

This automatic fetching is the source the N+1 query problems and in my experience most gross performance problems in Django apps.

The proposal essentially is to add a new queryset function that says for the group of objects fetched by this queryset, whenever one of these automatic foreign key queries happens on one of them instead of fetching the foreign key for just that one use the prefetch mechanism to fetch it for all of them.
The presumption being that the vast majority of the time when you access a field on one object from a queryset result, probably you are going to access the same field on many of the others as well.

The implementation I've used in production does nest across foreign keys so something (admittedly contrived) like:
for choice in Choice.objects.all():
    
print(choice.question.author)
Will produce 3 queries, one for all choices, one for the questions of those choices and one for the authors of those questions.

It's worth noting that because these are foreign keys in their "to one" direction each of those queryset results will be at most the same size (in rows) as the proceeding one and often (due to nulls and duplicates) smaller.

I do not propose touching reverse foreign key or many2many fields as the generated queries could request substantially more rows from the DB than the original query and it's not at all clear how this mechanism would sanely interact with filtering etc. So this is purely about the forward direction of foreign keys.

I hope that clarifies my thinking some.

Regards
G

On Tue, Aug 15, 2017 at 7:02 PM, Marc Tamlyn <[hidden email]> wrote:
Hi Gordon,

Thanks for the suggestion.

I'm not a fan of adding a layer that tries to be this clever. How would possible prefetches be identified? What happens when an initial loop in a view requires one prefetch, but a subsequent loop in a template requires some other prefetch? What about nested loops resulting in nested prefetches? Code like this is almost guaranteed to break unexpectedly in multiple ways. Personally, I would argue that correctly setting up and maintaining appropriate prefetches and selects is a necessary part of working with an ORM.

Do you know of any other ORMs which attempt similar magical optimisations? How do they go about identifying the cases where it is necessary?

On 15 August 2017 at 10:44, Gordon Wrigley <[hidden email]> wrote:
I'd like to discuss automatic prefetching in querysets. Specifically automatically doing prefetch_related where needed without the user having to request it.

For context consider these three snippets using the Question & Choice models from the tutorial when there are 100 questions each with 5 choices for a total of 500 choices.

Default
for choice in Choice.objects.all():
   
print(choice.question.question_text, ':', choice.choice_text)
501 db queries, fetches 500 choice rows and 500 question rows from the DB

Prefetch_related
for choice in Choice.objects.prefetch_related('question'):
   
print(choice.question.question_text, ':', choice.choice_text)
2 db queries, fetches 500 choice rows and 100 question rows from the DB

Select_related
for choice in Choice.objects.select_related('question'):
   
print(choice.question.question_text, ':', choice.choice_text)
1 db query, fetches 500 choice rows and 500 question rows from the DB

I've included select_related for completeness, I'm not going to propose changing anything about it's use. There are places where it is the best choice and in those places it will still be up to the user to request it. I will note that anywhere select_related is optimal prefetch_related is still better than the default and leave it at that.

The 'Default' example above is a classic example of the N+1 query problem, a problem that is widespread in Django apps.
This pattern of queries is what new users produce because they don't know enough about the database and / or ORM to do otherwise.
Experieced users will also often produce this because it's not always obvious what fields will and won't be used and subsequently what should be prefetched.
Additionally that list will change over time. A small change to a template to display an extra field can result in a denial of service on your DB due to a missing prefetch.
Identifying missing prefetches is fiddly, time consuming and error prone. Tools like django-perf-rec (which I was involved in creating) and nplusone exist in part to flag missing prefetches introduced by changed code.
Finally libraries like Django Rest Framework and the Admin will also produce queries like this because it's very difficult for them to know what needs prefetching without being explicitly told by an experienced user.

As hinted at the top I'd like to propose changing Django so the default code behaves like the prefetch_related code.
Longer term I think this should be the default behaviour but obviously it needs to be proved first so for now I'd suggest a new queryset function that enables this behaviour.

I have a proof of concept of this mechanism that I've used successfully in production. I'm not posting it yet because I'd like to focus on desired behavior rather than implementation details. But in summary, what it does is when accessing a missing field on a model, rather than fetching it just for that instance, it runs a prefetch_related query to fetch it for all peer instances that were fetched in the same queryset. So in the example above it prefetches all Questions in one query.

This might seem like a risky thing to do but I'd argue that it really isn't.
The only time this isn't superior to the default case is when you are post filtering the queryset results in Python.
Even in that case it's only inferior if you started with a large number of results, filtered basically all of them and the code is structured so that the filtered ones aren't garbage collected.
To cover this rare case the automatic prefetching can easily be disabled on a per queryset or per object basis. Leaving us with a rare downside that can easily be manually resolved in exchange for a significant general improvement.

In practice this thing is almost magical to work with. Unless you already have extensive and tightly maintained prefetches everywhere you get an immediate boost to virtually everything that touches the database, often knocking orders of magnitude off page load times.

If an agreement can be reached on pursuing this then I'm happy to put in the work to productize the proof of concept.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/d402bf30-a5af-4072-8b50-85e921f7f9af%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/django-developers/EplZGj-ejvg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CAMwjO1Gaha-K7KkefJkiS3LRdXvaPPwBeuKmhQv6bJFx3dty3w%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CAD-wiX3Xa%3DN-D95RPGo8%3D3kN0zunuAOw-SpYUa4g_zsk63bARQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Automatic prefetching in querysets

Collin Anderson-2
Hi Gordon,

How is it implemented? Does each object keep a reference to the queryset it came from?

Collin

On Tue, Aug 15, 2017 at 2:44 PM, Gordon Wrigley <[hidden email]> wrote:
Sorry maybe I wasn't clear enough about the proposed mechanism.

Currently when you dereference a foreign key field on an object (so 'choice.question' in the examples above) if it doesn't have the value cached from an earlier access, prefetch_related or select_related then Django will automatically perform a db query to fetch it. After that the value will then be cached on the object for any future dereferences.

This automatic fetching is the source the N+1 query problems and in my experience most gross performance problems in Django apps.

The proposal essentially is to add a new queryset function that says for the group of objects fetched by this queryset, whenever one of these automatic foreign key queries happens on one of them instead of fetching the foreign key for just that one use the prefetch mechanism to fetch it for all of them.
The presumption being that the vast majority of the time when you access a field on one object from a queryset result, probably you are going to access the same field on many of the others as well.

The implementation I've used in production does nest across foreign keys so something (admittedly contrived) like:
for choice in Choice.objects.all():
    
print(choice.question.author)
Will produce 3 queries, one for all choices, one for the questions of those choices and one for the authors of those questions.

It's worth noting that because these are foreign keys in their "to one" direction each of those queryset results will be at most the same size (in rows) as the proceeding one and often (due to nulls and duplicates) smaller.

I do not propose touching reverse foreign key or many2many fields as the generated queries could request substantially more rows from the DB than the original query and it's not at all clear how this mechanism would sanely interact with filtering etc. So this is purely about the forward direction of foreign keys.

I hope that clarifies my thinking some.

Regards
G

On Tue, Aug 15, 2017 at 7:02 PM, Marc Tamlyn <[hidden email]> wrote:
Hi Gordon,

Thanks for the suggestion.

I'm not a fan of adding a layer that tries to be this clever. How would possible prefetches be identified? What happens when an initial loop in a view requires one prefetch, but a subsequent loop in a template requires some other prefetch? What about nested loops resulting in nested prefetches? Code like this is almost guaranteed to break unexpectedly in multiple ways. Personally, I would argue that correctly setting up and maintaining appropriate prefetches and selects is a necessary part of working with an ORM.

Do you know of any other ORMs which attempt similar magical optimisations? How do they go about identifying the cases where it is necessary?

On 15 August 2017 at 10:44, Gordon Wrigley <[hidden email]> wrote:
I'd like to discuss automatic prefetching in querysets. Specifically automatically doing prefetch_related where needed without the user having to request it.

For context consider these three snippets using the Question & Choice models from the tutorial when there are 100 questions each with 5 choices for a total of 500 choices.

Default
for choice in Choice.objects.all():
   
print(choice.question.question_text, ':', choice.choice_text)
501 db queries, fetches 500 choice rows and 500 question rows from the DB

Prefetch_related
for choice in Choice.objects.prefetch_related('question'):
   
print(choice.question.question_text, ':', choice.choice_text)
2 db queries, fetches 500 choice rows and 100 question rows from the DB

Select_related
for choice in Choice.objects.select_related('question'):
   
print(choice.question.question_text, ':', choice.choice_text)
1 db query, fetches 500 choice rows and 500 question rows from the DB

I've included select_related for completeness, I'm not going to propose changing anything about it's use. There are places where it is the best choice and in those places it will still be up to the user to request it. I will note that anywhere select_related is optimal prefetch_related is still better than the default and leave it at that.

The 'Default' example above is a classic example of the N+1 query problem, a problem that is widespread in Django apps.
This pattern of queries is what new users produce because they don't know enough about the database and / or ORM to do otherwise.
Experieced users will also often produce this because it's not always obvious what fields will and won't be used and subsequently what should be prefetched.
Additionally that list will change over time. A small change to a template to display an extra field can result in a denial of service on your DB due to a missing prefetch.
Identifying missing prefetches is fiddly, time consuming and error prone. Tools like django-perf-rec (which I was involved in creating) and nplusone exist in part to flag missing prefetches introduced by changed code.
Finally libraries like Django Rest Framework and the Admin will also produce queries like this because it's very difficult for them to know what needs prefetching without being explicitly told by an experienced user.

As hinted at the top I'd like to propose changing Django so the default code behaves like the prefetch_related code.
Longer term I think this should be the default behaviour but obviously it needs to be proved first so for now I'd suggest a new queryset function that enables this behaviour.

I have a proof of concept of this mechanism that I've used successfully in production. I'm not posting it yet because I'd like to focus on desired behavior rather than implementation details. But in summary, what it does is when accessing a missing field on a model, rather than fetching it just for that instance, it runs a prefetch_related query to fetch it for all peer instances that were fetched in the same queryset. So in the example above it prefetches all Questions in one query.

This might seem like a risky thing to do but I'd argue that it really isn't.
The only time this isn't superior to the default case is when you are post filtering the queryset results in Python.
Even in that case it's only inferior if you started with a large number of results, filtered basically all of them and the code is structured so that the filtered ones aren't garbage collected.
To cover this rare case the automatic prefetching can easily be disabled on a per queryset or per object basis. Leaving us with a rare downside that can easily be manually resolved in exchange for a significant general improvement.

In practice this thing is almost magical to work with. Unless you already have extensive and tightly maintained prefetches everywhere you get an immediate boost to virtually everything that touches the database, often knocking orders of magnitude off page load times.

If an agreement can be reached on pursuing this then I'm happy to put in the work to productize the proof of concept.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/d402bf30-a5af-4072-8b50-85e921f7f9af%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/django-developers/EplZGj-ejvg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CAMwjO1Gaha-K7KkefJkiS3LRdXvaPPwBeuKmhQv6bJFx3dty3w%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CAD-wiX3Xa%3DN-D95RPGo8%3D3kN0zunuAOw-SpYUa4g_zsk63bARQ%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CAFO84S5Oi21dbr4VE2FC1-duLrfT%2BOB4TOMAr2UWWyzD-LwadQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Automatic prefetching in querysets

tolomea
In my current version each object keeps a reference to a WeakSet of the results of the queryset it came from.
This is populated in _fetch_all and if it is populated then ForwardManyToOneDescriptor does a prefetch across all the objects in the WeakSet instead of it's regular fetching.

On Tue, Aug 15, 2017 at 8:03 PM, Collin Anderson <[hidden email]> wrote:
Hi Gordon,

How is it implemented? Does each object keep a reference to the queryset it came from?

Collin

On Tue, Aug 15, 2017 at 2:44 PM, Gordon Wrigley <[hidden email]> wrote:
Sorry maybe I wasn't clear enough about the proposed mechanism.

Currently when you dereference a foreign key field on an object (so 'choice.question' in the examples above) if it doesn't have the value cached from an earlier access, prefetch_related or select_related then Django will automatically perform a db query to fetch it. After that the value will then be cached on the object for any future dereferences.

This automatic fetching is the source the N+1 query problems and in my experience most gross performance problems in Django apps.

The proposal essentially is to add a new queryset function that says for the group of objects fetched by this queryset, whenever one of these automatic foreign key queries happens on one of them instead of fetching the foreign key for just that one use the prefetch mechanism to fetch it for all of them.
The presumption being that the vast majority of the time when you access a field on one object from a queryset result, probably you are going to access the same field on many of the others as well.

The implementation I've used in production does nest across foreign keys so something (admittedly contrived) like:
for choice in Choice.objects.all():
    
print(choice.question.author)
Will produce 3 queries, one for all choices, one for the questions of those choices and one for the authors of those questions.

It's worth noting that because these are foreign keys in their "to one" direction each of those queryset results will be at most the same size (in rows) as the proceeding one and often (due to nulls and duplicates) smaller.

I do not propose touching reverse foreign key or many2many fields as the generated queries could request substantially more rows from the DB than the original query and it's not at all clear how this mechanism would sanely interact with filtering etc. So this is purely about the forward direction of foreign keys.

I hope that clarifies my thinking some.

Regards
G

On Tue, Aug 15, 2017 at 7:02 PM, Marc Tamlyn <[hidden email]> wrote:
Hi Gordon,

Thanks for the suggestion.

I'm not a fan of adding a layer that tries to be this clever. How would possible prefetches be identified? What happens when an initial loop in a view requires one prefetch, but a subsequent loop in a template requires some other prefetch? What about nested loops resulting in nested prefetches? Code like this is almost guaranteed to break unexpectedly in multiple ways. Personally, I would argue that correctly setting up and maintaining appropriate prefetches and selects is a necessary part of working with an ORM.

Do you know of any other ORMs which attempt similar magical optimisations? How do they go about identifying the cases where it is necessary?

On 15 August 2017 at 10:44, Gordon Wrigley <[hidden email]> wrote:
I'd like to discuss automatic prefetching in querysets. Specifically automatically doing prefetch_related where needed without the user having to request it.

For context consider these three snippets using the Question & Choice models from the tutorial when there are 100 questions each with 5 choices for a total of 500 choices.

Default
for choice in Choice.objects.all():
   
print(choice.question.question_text, ':', choice.choice_text)
501 db queries, fetches 500 choice rows and 500 question rows from the DB

Prefetch_related
for choice in Choice.objects.prefetch_related('question'):
   
print(choice.question.question_text, ':', choice.choice_text)
2 db queries, fetches 500 choice rows and 100 question rows from the DB

Select_related
for choice in Choice.objects.select_related('question'):
   
print(choice.question.question_text, ':', choice.choice_text)
1 db query, fetches 500 choice rows and 500 question rows from the DB

I've included select_related for completeness, I'm not going to propose changing anything about it's use. There are places where it is the best choice and in those places it will still be up to the user to request it. I will note that anywhere select_related is optimal prefetch_related is still better than the default and leave it at that.

The 'Default' example above is a classic example of the N+1 query problem, a problem that is widespread in Django apps.
This pattern of queries is what new users produce because they don't know enough about the database and / or ORM to do otherwise.
Experieced users will also often produce this because it's not always obvious what fields will and won't be used and subsequently what should be prefetched.
Additionally that list will change over time. A small change to a template to display an extra field can result in a denial of service on your DB due to a missing prefetch.
Identifying missing prefetches is fiddly, time consuming and error prone. Tools like django-perf-rec (which I was involved in creating) and nplusone exist in part to flag missing prefetches introduced by changed code.
Finally libraries like Django Rest Framework and the Admin will also produce queries like this because it's very difficult for them to know what needs prefetching without being explicitly told by an experienced user.

As hinted at the top I'd like to propose changing Django so the default code behaves like the prefetch_related code.
Longer term I think this should be the default behaviour but obviously it needs to be proved first so for now I'd suggest a new queryset function that enables this behaviour.

I have a proof of concept of this mechanism that I've used successfully in production. I'm not posting it yet because I'd like to focus on desired behavior rather than implementation details. But in summary, what it does is when accessing a missing field on a model, rather than fetching it just for that instance, it runs a prefetch_related query to fetch it for all peer instances that were fetched in the same queryset. So in the example above it prefetches all Questions in one query.

This might seem like a risky thing to do but I'd argue that it really isn't.
The only time this isn't superior to the default case is when you are post filtering the queryset results in Python.
Even in that case it's only inferior if you started with a large number of results, filtered basically all of them and the code is structured so that the filtered ones aren't garbage collected.
To cover this rare case the automatic prefetching can easily be disabled on a per queryset or per object basis. Leaving us with a rare downside that can easily be manually resolved in exchange for a significant general improvement.

In practice this thing is almost magical to work with. Unless you already have extensive and tightly maintained prefetches everywhere you get an immediate boost to virtually everything that touches the database, often knocking orders of magnitude off page load times.

If an agreement can be reached on pursuing this then I'm happy to put in the work to productize the proof of concept.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/d402bf30-a5af-4072-8b50-85e921f7f9af%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/django-developers/EplZGj-ejvg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CAMwjO1Gaha-K7KkefJkiS3LRdXvaPPwBeuKmhQv6bJFx3dty3w%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CAD-wiX3Xa%3DN-D95RPGo8%3D3kN0zunuAOw-SpYUa4g_zsk63bARQ%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/django-developers/EplZGj-ejvg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CAFO84S5Oi21dbr4VE2FC1-duLrfT%2BOB4TOMAr2UWWyzD-LwadQ%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CAD-wiX2n%2BGS_twZEFXMQkPUxgx0Y%3Dr8yKXC7R2aMfme135Ri4A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Automatic prefetching in querysets

Tom Forbes
In reply to this post by tolomea
Exploding query counts are definitely a pain point in Django, anything to improve that is definitely worth considering. They have been a problem in all Django projects I have seen.

However I think the correct solution is for developers to correctly add select/prefetch calls. There is no general solution for automatically applying them that works for enough cases, and i think adding such a method to querysets would be used incorrectly and too often. 

Perhaps a better solution would be for Django to detect these O(n) query cases and display intelligent warnings, with suggestions as to the correct select/prefetch calls to add. When debug mode is enabled we could detect repeated foreign key referencing from the same source.

On 15 Aug 2017 19:44, "Gordon Wrigley" <[hidden email]> wrote:
Sorry maybe I wasn't clear enough about the proposed mechanism.

Currently when you dereference a foreign key field on an object (so 'choice.question' in the examples above) if it doesn't have the value cached from an earlier access, prefetch_related or select_related then Django will automatically perform a db query to fetch it. After that the value will then be cached on the object for any future dereferences.

This automatic fetching is the source the N+1 query problems and in my experience most gross performance problems in Django apps.

The proposal essentially is to add a new queryset function that says for the group of objects fetched by this queryset, whenever one of these automatic foreign key queries happens on one of them instead of fetching the foreign key for just that one use the prefetch mechanism to fetch it for all of them.
The presumption being that the vast majority of the time when you access a field on one object from a queryset result, probably you are going to access the same field on many of the others as well.

The implementation I've used in production does nest across foreign keys so something (admittedly contrived) like:
for choice in Choice.objects.all():
    
print(choice.question.author)
Will produce 3 queries, one for all choices, one for the questions of those choices and one for the authors of those questions.

It's worth noting that because these are foreign keys in their "to one" direction each of those queryset results will be at most the same size (in rows) as the proceeding one and often (due to nulls and duplicates) smaller.

I do not propose touching reverse foreign key or many2many fields as the generated queries could request substantially more rows from the DB than the original query and it's not at all clear how this mechanism would sanely interact with filtering etc. So this is purely about the forward direction of foreign keys.

I hope that clarifies my thinking some.

Regards
G

On Tue, Aug 15, 2017 at 7:02 PM, Marc Tamlyn <[hidden email]> wrote:
Hi Gordon,

Thanks for the suggestion.

I'm not a fan of adding a layer that tries to be this clever. How would possible prefetches be identified? What happens when an initial loop in a view requires one prefetch, but a subsequent loop in a template requires some other prefetch? What about nested loops resulting in nested prefetches? Code like this is almost guaranteed to break unexpectedly in multiple ways. Personally, I would argue that correctly setting up and maintaining appropriate prefetches and selects is a necessary part of working with an ORM.

Do you know of any other ORMs which attempt similar magical optimisations? How do they go about identifying the cases where it is necessary?

On 15 August 2017 at 10:44, Gordon Wrigley <[hidden email]> wrote:
I'd like to discuss automatic prefetching in querysets. Specifically automatically doing prefetch_related where needed without the user having to request it.

For context consider these three snippets using the Question & Choice models from the tutorial when there are 100 questions each with 5 choices for a total of 500 choices.

Default
for choice in Choice.objects.all():
   
print(choice.question.question_text, ':', choice.choice_text)
501 db queries, fetches 500 choice rows and 500 question rows from the DB

Prefetch_related
for choice in Choice.objects.prefetch_related('question'):
   
print(choice.question.question_text, ':', choice.choice_text)
2 db queries, fetches 500 choice rows and 100 question rows from the DB

Select_related
for choice in Choice.objects.select_related('question'):
   
print(choice.question.question_text, ':', choice.choice_text)
1 db query, fetches 500 choice rows and 500 question rows from the DB

I've included select_related for completeness, I'm not going to propose changing anything about it's use. There are places where it is the best choice and in those places it will still be up to the user to request it. I will note that anywhere select_related is optimal prefetch_related is still better than the default and leave it at that.

The 'Default' example above is a classic example of the N+1 query problem, a problem that is widespread in Django apps.
This pattern of queries is what new users produce because they don't know enough about the database and / or ORM to do otherwise.
Experieced users will also often produce this because it's not always obvious what fields will and won't be used and subsequently what should be prefetched.
Additionally that list will change over time. A small change to a template to display an extra field can result in a denial of service on your DB due to a missing prefetch.
Identifying missing prefetches is fiddly, time consuming and error prone. Tools like django-perf-rec (which I was involved in creating) and nplusone exist in part to flag missing prefetches introduced by changed code.
Finally libraries like Django Rest Framework and the Admin will also produce queries like this because it's very difficult for them to know what needs prefetching without being explicitly told by an experienced user.

As hinted at the top I'd like to propose changing Django so the default code behaves like the prefetch_related code.
Longer term I think this should be the default behaviour but obviously it needs to be proved first so for now I'd suggest a new queryset function that enables this behaviour.

I have a proof of concept of this mechanism that I've used successfully in production. I'm not posting it yet because I'd like to focus on desired behavior rather than implementation details. But in summary, what it does is when accessing a missing field on a model, rather than fetching it just for that instance, it runs a prefetch_related query to fetch it for all peer instances that were fetched in the same queryset. So in the example above it prefetches all Questions in one query.

This might seem like a risky thing to do but I'd argue that it really isn't.
The only time this isn't superior to the default case is when you are post filtering the queryset results in Python.
Even in that case it's only inferior if you started with a large number of results, filtered basically all of them and the code is structured so that the filtered ones aren't garbage collected.
To cover this rare case the automatic prefetching can easily be disabled on a per queryset or per object basis. Leaving us with a rare downside that can easily be manually resolved in exchange for a significant general improvement.

In practice this thing is almost magical to work with. Unless you already have extensive and tightly maintained prefetches everywhere you get an immediate boost to virtually everything that touches the database, often knocking orders of magnitude off page load times.

If an agreement can be reached on pursuing this then I'm happy to put in the work to productize the proof of concept.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/d402bf30-a5af-4072-8b50-85e921f7f9af%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/django-developers/EplZGj-ejvg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to [hidden email].

To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CAD-wiX3Xa%3DN-D95RPGo8%3D3kN0zunuAOw-SpYUa4g_zsk63bARQ%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.


--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CAFNZOJO5LEf_i%2BqG2KFUOrbTXG-yanubzjFvC1mqU-B0GGG9ng%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Automatic prefetching in querysets

tolomea
In reply to this post by tolomea
I didn't answer your questions directly. Sorry for the quoting but it's the easiest way to deal with a half dozen questions.

How would possible prefetches be identified?

Wherever we currently automatically fetch a foreign key value.

What happens when an initial loop in a view requires one prefetch, but a subsequent loop in a template requires some other prefetch?

They each do whatever prefetch they need, just as a human optimizing this would add two prefetch clauses.

What about nested loops resulting in nested prefetches?

Nested loops only really come up when you are dealing with RelatedManagers which are outside the scope of this. Or did you have some other nested loop case in mind?

 I would argue that correctly setting up and maintaining appropriate prefetches and selects is a necessary part of working with an ORM.

Having been lead engineer on a code base of ~100,000 lines with over 100 calls to prefetch_related and a lot of tests specifically for finding missing ones I'd argue it's one of the worst aspects of working with Djangos ORM at non trivial scale.

Do you know of any other ORMs which attempt similar magical optimisations? 

I don't, but unlike Django where I have years of experience I have next to no experience with other ORM's.

Regards G

On Tue, Aug 15, 2017 at 7:44 PM, Gordon Wrigley <[hidden email]> wrote:
Sorry maybe I wasn't clear enough about the proposed mechanism.

Currently when you dereference a foreign key field on an object (so 'choice.question' in the examples above) if it doesn't have the value cached from an earlier access, prefetch_related or select_related then Django will automatically perform a db query to fetch it. After that the value will then be cached on the object for any future dereferences.

This automatic fetching is the source the N+1 query problems and in my experience most gross performance problems in Django apps.

The proposal essentially is to add a new queryset function that says for the group of objects fetched by this queryset, whenever one of these automatic foreign key queries happens on one of them instead of fetching the foreign key for just that one use the prefetch mechanism to fetch it for all of them.
The presumption being that the vast majority of the time when you access a field on one object from a queryset result, probably you are going to access the same field on many of the others as well.

The implementation I've used in production does nest across foreign keys so something (admittedly contrived) like:
for choice in Choice.objects.all():
    
print(choice.question.author)
Will produce 3 queries, one for all choices, one for the questions of those choices and one for the authors of those questions.

It's worth noting that because these are foreign keys in their "to one" direction each of those queryset results will be at most the same size (in rows) as the proceeding one and often (due to nulls and duplicates) smaller.

I do not propose touching reverse foreign key or many2many fields as the generated queries could request substantially more rows from the DB than the original query and it's not at all clear how this mechanism would sanely interact with filtering etc. So this is purely about the forward direction of foreign keys.

I hope that clarifies my thinking some.

Regards
G

On Tue, Aug 15, 2017 at 7:02 PM, Marc Tamlyn <[hidden email]> wrote:
Hi Gordon,

Thanks for the suggestion.

I'm not a fan of adding a layer that tries to be this clever. How would possible prefetches be identified? What happens when an initial loop in a view requires one prefetch, but a subsequent loop in a template requires some other prefetch? What about nested loops resulting in nested prefetches? Code like this is almost guaranteed to break unexpectedly in multiple ways. Personally, I would argue that correctly setting up and maintaining appropriate prefetches and selects is a necessary part of working with an ORM.

Do you know of any other ORMs which attempt similar magical optimisations? How do they go about identifying the cases where it is necessary?

On 15 August 2017 at 10:44, Gordon Wrigley <[hidden email]> wrote:
I'd like to discuss automatic prefetching in querysets. Specifically automatically doing prefetch_related where needed without the user having to request it.

For context consider these three snippets using the Question & Choice models from the tutorial when there are 100 questions each with 5 choices for a total of 500 choices.

Default
for choice in Choice.objects.all():
   
print(choice.question.question_text, ':', choice.choice_text)
501 db queries, fetches 500 choice rows and 500 question rows from the DB

Prefetch_related
for choice in Choice.objects.prefetch_related('question'):
   
print(choice.question.question_text, ':', choice.choice_text)
2 db queries, fetches 500 choice rows and 100 question rows from the DB

Select_related
for choice in Choice.objects.select_related('question'):
   
print(choice.question.question_text, ':', choice.choice_text)
1 db query, fetches 500 choice rows and 500 question rows from the DB

I've included select_related for completeness, I'm not going to propose changing anything about it's use. There are places where it is the best choice and in those places it will still be up to the user to request it. I will note that anywhere select_related is optimal prefetch_related is still better than the default and leave it at that.

The 'Default' example above is a classic example of the N+1 query problem, a problem that is widespread in Django apps.
This pattern of queries is what new users produce because they don't know enough about the database and / or ORM to do otherwise.
Experieced users will also often produce this because it's not always obvious what fields will and won't be used and subsequently what should be prefetched.
Additionally that list will change over time. A small change to a template to display an extra field can result in a denial of service on your DB due to a missing prefetch.
Identifying missing prefetches is fiddly, time consuming and error prone. Tools like django-perf-rec (which I was involved in creating) and nplusone exist in part to flag missing prefetches introduced by changed code.
Finally libraries like Django Rest Framework and the Admin will also produce queries like this because it's very difficult for them to know what needs prefetching without being explicitly told by an experienced user.

As hinted at the top I'd like to propose changing Django so the default code behaves like the prefetch_related code.
Longer term I think this should be the default behaviour but obviously it needs to be proved first so for now I'd suggest a new queryset function that enables this behaviour.

I have a proof of concept of this mechanism that I've used successfully in production. I'm not posting it yet because I'd like to focus on desired behavior rather than implementation details. But in summary, what it does is when accessing a missing field on a model, rather than fetching it just for that instance, it runs a prefetch_related query to fetch it for all peer instances that were fetched in the same queryset. So in the example above it prefetches all Questions in one query.

This might seem like a risky thing to do but I'd argue that it really isn't.
The only time this isn't superior to the default case is when you are post filtering the queryset results in Python.
Even in that case it's only inferior if you started with a large number of results, filtered basically all of them and the code is structured so that the filtered ones aren't garbage collected.
To cover this rare case the automatic prefetching can easily be disabled on a per queryset or per object basis. Leaving us with a rare downside that can easily be manually resolved in exchange for a significant general improvement.

In practice this thing is almost magical to work with. Unless you already have extensive and tightly maintained prefetches everywhere you get an immediate boost to virtually everything that touches the database, often knocking orders of magnitude off page load times.

If an agreement can be reached on pursuing this then I'm happy to put in the work to productize the proof of concept.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/d402bf30-a5af-4072-8b50-85e921f7f9af%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/django-developers/EplZGj-ejvg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CAMwjO1Gaha-K7KkefJkiS3LRdXvaPPwBeuKmhQv6bJFx3dty3w%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.


--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CAD-wiX0xAr2SYMaTRZHrF--HAYDyz_98r%2BdniZLUqg%3DKEwS0Yg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Automatic prefetching in querysets

tolomea
In reply to this post by Tom Forbes
The warnings you propose would certainly be an improvement on the status quo.
However for that to be a complete solution Django would also need to detect places where there are redundant prefetch_relateds.

Additionally tools like the Admin and DRF would need to provide adequate hooks for inserting these calls.
For example ModelAdmin.get_queryset is not really granular enough as it's used by both the list and detail views which might touch quite different sets of fields. (Although in practice what you generally do is optimize the list view as that's the one that tends to explode)

That aside I sincerely believe that the proposed approach is superior to the current default behavior in the majority of cases and further more doesn't fail as badly as the current behavior when it's not appropriate. I expect that if implemented as an option then in time that belief would prove itself.

On Tue, Aug 15, 2017 at 8:17 PM, Tom Forbes <[hidden email]> wrote:
Exploding query counts are definitely a pain point in Django, anything to improve that is definitely worth considering. They have been a problem in all Django projects I have seen.

However I think the correct solution is for developers to correctly add select/prefetch calls. There is no general solution for automatically applying them that works for enough cases, and i think adding such a method to querysets would be used incorrectly and too often. 

Perhaps a better solution would be for Django to detect these O(n) query cases and display intelligent warnings, with suggestions as to the correct select/prefetch calls to add. When debug mode is enabled we could detect repeated foreign key referencing from the same source.

On 15 Aug 2017 19:44, "Gordon Wrigley" <[hidden email]> wrote:
Sorry maybe I wasn't clear enough about the proposed mechanism.

Currently when you dereference a foreign key field on an object (so 'choice.question' in the examples above) if it doesn't have the value cached from an earlier access, prefetch_related or select_related then Django will automatically perform a db query to fetch it. After that the value will then be cached on the object for any future dereferences.

This automatic fetching is the source the N+1 query problems and in my experience most gross performance problems in Django apps.

The proposal essentially is to add a new queryset function that says for the group of objects fetched by this queryset, whenever one of these automatic foreign key queries happens on one of them instead of fetching the foreign key for just that one use the prefetch mechanism to fetch it for all of them.
The presumption being that the vast majority of the time when you access a field on one object from a queryset result, probably you are going to access the same field on many of the others as well.

The implementation I've used in production does nest across foreign keys so something (admittedly contrived) like:
for choice in Choice.objects.all():
    
print(choice.question.author)
Will produce 3 queries, one for all choices, one for the questions of those choices and one for the authors of those questions.

It's worth noting that because these are foreign keys in their "to one" direction each of those queryset results will be at most the same size (in rows) as the proceeding one and often (due to nulls and duplicates) smaller.

I do not propose touching reverse foreign key or many2many fields as the generated queries could request substantially more rows from the DB than the original query and it's not at all clear how this mechanism would sanely interact with filtering etc. So this is purely about the forward direction of foreign keys.

I hope that clarifies my thinking some.

Regards
G

On Tue, Aug 15, 2017 at 7:02 PM, Marc Tamlyn <[hidden email]> wrote:
Hi Gordon,

Thanks for the suggestion.

I'm not a fan of adding a layer that tries to be this clever. How would possible prefetches be identified? What happens when an initial loop in a view requires one prefetch, but a subsequent loop in a template requires some other prefetch? What about nested loops resulting in nested prefetches? Code like this is almost guaranteed to break unexpectedly in multiple ways. Personally, I would argue that correctly setting up and maintaining appropriate prefetches and selects is a necessary part of working with an ORM.

Do you know of any other ORMs which attempt similar magical optimisations? How do they go about identifying the cases where it is necessary?

On 15 August 2017 at 10:44, Gordon Wrigley <[hidden email]> wrote:
I'd like to discuss automatic prefetching in querysets. Specifically automatically doing prefetch_related where needed without the user having to request it.

For context consider these three snippets using the Question & Choice models from the tutorial when there are 100 questions each with 5 choices for a total of 500 choices.

Default
for choice in Choice.objects.all():
   
print(choice.question.question_text, ':', choice.choice_text)
501 db queries, fetches 500 choice rows and 500 question rows from the DB

Prefetch_related
for choice in Choice.objects.prefetch_related('question'):
   
print(choice.question.question_text, ':', choice.choice_text)
2 db queries, fetches 500 choice rows and 100 question rows from the DB

Select_related
for choice in Choice.objects.select_related('question'):
   
print(choice.question.question_text, ':', choice.choice_text)
1 db query, fetches 500 choice rows and 500 question rows from the DB

I've included select_related for completeness, I'm not going to propose changing anything about it's use. There are places where it is the best choice and in those places it will still be up to the user to request it. I will note that anywhere select_related is optimal prefetch_related is still better than the default and leave it at that.

The 'Default' example above is a classic example of the N+1 query problem, a problem that is widespread in Django apps.
This pattern of queries is what new users produce because they don't know enough about the database and / or ORM to do otherwise.
Experieced users will also often produce this because it's not always obvious what fields will and won't be used and subsequently what should be prefetched.
Additionally that list will change over time. A small change to a template to display an extra field can result in a denial of service on your DB due to a missing prefetch.
Identifying missing prefetches is fiddly, time consuming and error prone. Tools like django-perf-rec (which I was involved in creating) and nplusone exist in part to flag missing prefetches introduced by changed code.
Finally libraries like Django Rest Framework and the Admin will also produce queries like this because it's very difficult for them to know what needs prefetching without being explicitly told by an experienced user.

As hinted at the top I'd like to propose changing Django so the default code behaves like the prefetch_related code.
Longer term I think this should be the default behaviour but obviously it needs to be proved first so for now I'd suggest a new queryset function that enables this behaviour.

I have a proof of concept of this mechanism that I've used successfully in production. I'm not posting it yet because I'd like to focus on desired behavior rather than implementation details. But in summary, what it does is when accessing a missing field on a model, rather than fetching it just for that instance, it runs a prefetch_related query to fetch it for all peer instances that were fetched in the same queryset. So in the example above it prefetches all Questions in one query.

This might seem like a risky thing to do but I'd argue that it really isn't.
The only time this isn't superior to the default case is when you are post filtering the queryset results in Python.
Even in that case it's only inferior if you started with a large number of results, filtered basically all of them and the code is structured so that the filtered ones aren't garbage collected.
To cover this rare case the automatic prefetching can easily be disabled on a per queryset or per object basis. Leaving us with a rare downside that can easily be manually resolved in exchange for a significant general improvement.

In practice this thing is almost magical to work with. Unless you already have extensive and tightly maintained prefetches everywhere you get an immediate boost to virtually everything that touches the database, often knocking orders of magnitude off page load times.

If an agreement can be reached on pursuing this then I'm happy to put in the work to productize the proof of concept.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/d402bf30-a5af-4072-8b50-85e921f7f9af%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/django-developers/EplZGj-ejvg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to [hidden email].

To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CAD-wiX3Xa%3DN-D95RPGo8%3D3kN0zunuAOw-SpYUa4g_zsk63bARQ%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.


--
You received this message because you are subscribed to a topic in the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/django-developers/EplZGj-ejvg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CAFNZOJO5LEf_i%2BqG2KFUOrbTXG-yanubzjFvC1mqU-B0GGG9ng%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CAD-wiX22Fn_qyvEcnLHEsPoKyvxGsrLXiGXvP%3Dz5%2BoX9W-NnNg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Automatic prefetching in querysets

Adam Johnson-2
I'm biased here, Gordon was my boss for nearly three years, 2014-2016.

I'm in favour of adding the auto-prefetching, I've seen it work. It was created around midway through last year and applied to our application's Admin interface, which was covered in tests with django-perf-rec. After adding the automatic prefetch, we not only identified some existing N+1 query problems that hadn't been picked up (they were there in the performance record files, just there were so many queries humans had missed reading some), we also found a number of stale prefetch queries that could be removed because the data they fetched wasn't being used. Additionally not all of the admin interface was perf-rec-tested/optimized so some pages were "just faster" with no extra effort.

I think there's a case for adding it to core - releasing as a third party package would make it difficult to use since it requires changes - mostly small - to QuerySet, ForeignKey, and the descriptor for ForeignKey. Users of such a package would have to use subclasses of all of these, the trickiest being ForeignKey since it triggers migrations... We had the luxury in our codebase of already having these subclasses for other customizations, so it was easier to roll out.

On 15 August 2017 at 20:35, Gordon Wrigley <[hidden email]> wrote:
The warnings you propose would certainly be an improvement on the status quo.
However for that to be a complete solution Django would also need to detect places where there are redundant prefetch_relateds.

Additionally tools like the Admin and DRF would need to provide adequate hooks for inserting these calls.
For example ModelAdmin.get_queryset is not really granular enough as it's used by both the list and detail views which might touch quite different sets of fields. (Although in practice what you generally do is optimize the list view as that's the one that tends to explode)

That aside I sincerely believe that the proposed approach is superior to the current default behavior in the majority of cases and further more doesn't fail as badly as the current behavior when it's not appropriate. I expect that if implemented as an option then in time that belief would prove itself.

On Tue, Aug 15, 2017 at 8:17 PM, Tom Forbes <[hidden email]> wrote:
Exploding query counts are definitely a pain point in Django, anything to improve that is definitely worth considering. They have been a problem in all Django projects I have seen.

However I think the correct solution is for developers to correctly add select/prefetch calls. There is no general solution for automatically applying them that works for enough cases, and i think adding such a method to querysets would be used incorrectly and too often. 

Perhaps a better solution would be for Django to detect these O(n) query cases and display intelligent warnings, with suggestions as to the correct select/prefetch calls to add. When debug mode is enabled we could detect repeated foreign key referencing from the same source.

On 15 Aug 2017 19:44, "Gordon Wrigley" <[hidden email]> wrote:
Sorry maybe I wasn't clear enough about the proposed mechanism.

Currently when you dereference a foreign key field on an object (so 'choice.question' in the examples above) if it doesn't have the value cached from an earlier access, prefetch_related or select_related then Django will automatically perform a db query to fetch it. After that the value will then be cached on the object for any future dereferences.

This automatic fetching is the source the N+1 query problems and in my experience most gross performance problems in Django apps.

The proposal essentially is to add a new queryset function that says for the group of objects fetched by this queryset, whenever one of these automatic foreign key queries happens on one of them instead of fetching the foreign key for just that one use the prefetch mechanism to fetch it for all of them.
The presumption being that the vast majority of the time when you access a field on one object from a queryset result, probably you are going to access the same field on many of the others as well.

The implementation I've used in production does nest across foreign keys so something (admittedly contrived) like:
for choice in Choice.objects.all():
    
print(choice.question.author)
Will produce 3 queries, one for all choices, one for the questions of those choices and one for the authors of those questions.

It's worth noting that because these are foreign keys in their "to one" direction each of those queryset results will be at most the same size (in rows) as the proceeding one and often (due to nulls and duplicates) smaller.

I do not propose touching reverse foreign key or many2many fields as the generated queries could request substantially more rows from the DB than the original query and it's not at all clear how this mechanism would sanely interact with filtering etc. So this is purely about the forward direction of foreign keys.

I hope that clarifies my thinking some.

Regards
G

On Tue, Aug 15, 2017 at 7:02 PM, Marc Tamlyn <[hidden email]> wrote:
Hi Gordon,

Thanks for the suggestion.

I'm not a fan of adding a layer that tries to be this clever. How would possible prefetches be identified? What happens when an initial loop in a view requires one prefetch, but a subsequent loop in a template requires some other prefetch? What about nested loops resulting in nested prefetches? Code like this is almost guaranteed to break unexpectedly in multiple ways. Personally, I would argue that correctly setting up and maintaining appropriate prefetches and selects is a necessary part of working with an ORM.

Do you know of any other ORMs which attempt similar magical optimisations? How do they go about identifying the cases where it is necessary?

On 15 August 2017 at 10:44, Gordon Wrigley <[hidden email]> wrote:
I'd like to discuss automatic prefetching in querysets. Specifically automatically doing prefetch_related where needed without the user having to request it.

For context consider these three snippets using the Question & Choice models from the tutorial when there are 100 questions each with 5 choices for a total of 500 choices.

Default
for choice in Choice.objects.all():
   
print(choice.question.question_text, ':', choice.choice_text)
501 db queries, fetches 500 choice rows and 500 question rows from the DB

Prefetch_related
for choice in Choice.objects.prefetch_related('question'):
   
print(choice.question.question_text, ':', choice.choice_text)
2 db queries, fetches 500 choice rows and 100 question rows from the DB

Select_related
for choice in Choice.objects.select_related('question'):
   
print(choice.question.question_text, ':', choice.choice_text)
1 db query, fetches 500 choice rows and 500 question rows from the DB

I've included select_related for completeness, I'm not going to propose changing anything about it's use. There are places where it is the best choice and in those places it will still be up to the user to request it. I will note that anywhere select_related is optimal prefetch_related is still better than the default and leave it at that.

The 'Default' example above is a classic example of the N+1 query problem, a problem that is widespread in Django apps.
This pattern of queries is what new users produce because they don't know enough about the database and / or ORM to do otherwise.
Experieced users will also often produce this because it's not always obvious what fields will and won't be used and subsequently what should be prefetched.
Additionally that list will change over time. A small change to a template to display an extra field can result in a denial of service on your DB due to a missing prefetch.
Identifying missing prefetches is fiddly, time consuming and error prone. Tools like django-perf-rec (which I was involved in creating) and nplusone exist in part to flag missing prefetches introduced by changed code.
Finally libraries like Django Rest Framework and the Admin will also produce queries like this because it's very difficult for them to know what needs prefetching without being explicitly told by an experienced user.

As hinted at the top I'd like to propose changing Django so the default code behaves like the prefetch_related code.
Longer term I think this should be the default behaviour but obviously it needs to be proved first so for now I'd suggest a new queryset function that enables this behaviour.

I have a proof of concept of this mechanism that I've used successfully in production. I'm not posting it yet because I'd like to focus on desired behavior rather than implementation details. But in summary, what it does is when accessing a missing field on a model, rather than fetching it just for that instance, it runs a prefetch_related query to fetch it for all peer instances that were fetched in the same queryset. So in the example above it prefetches all Questions in one query.

This might seem like a risky thing to do but I'd argue that it really isn't.
The only time this isn't superior to the default case is when you are post filtering the queryset results in Python.
Even in that case it's only inferior if you started with a large number of results, filtered basically all of them and the code is structured so that the filtered ones aren't garbage collected.
To cover this rare case the automatic prefetching can easily be disabled on a per queryset or per object basis. Leaving us with a rare downside that can easily be manually resolved in exchange for a significant general improvement.

In practice this thing is almost magical to work with. Unless you already have extensive and tightly maintained prefetches everywhere you get an immediate boost to virtually everything that touches the database, often knocking orders of magnitude off page load times.

If an agreement can be reached on pursuing this then I'm happy to put in the work to productize the proof of concept.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/d402bf30-a5af-4072-8b50-85e921f7f9af%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/django-developers/EplZGj-ejvg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to [hidden email].

To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CAD-wiX3Xa%3DN-D95RPGo8%3D3kN0zunuAOw-SpYUa4g_zsk63bARQ%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.


--
You received this message because you are subscribed to a topic in the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/django-developers/EplZGj-ejvg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CAFNZOJO5LEf_i%2BqG2KFUOrbTXG-yanubzjFvC1mqU-B0GGG9ng%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CAD-wiX22Fn_qyvEcnLHEsPoKyvxGsrLXiGXvP%3Dz5%2BoX9W-NnNg%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.



--
Adam

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CAMyDDM0i5JqpQABDRYebSTmoFLP6M8%2BdCct%2Bmm%3DVxT1gixf63w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Automatic prefetching in querysets

Josh Smeaton
I'm in favour of *some* automated way of handling these prefetches, whether it's opt-in or opt-out there should be some mechanism for protection. Preferably with loud logging that directs users to the source of the automated hand-holding so they have an opportunity to disable or fine tune the queries. Not all Django devs are ORM/Database/Python experts - some are frontend devs just trying to get by. I know that this kind of proposed behaviour would have both saved our site from massive performance issues, but also likely guided these same devs to the source of potential issues.

Agreed that prefetch_related is strictly better than not, and is acceptable in the absence of select_related.

I think keeping the code back and focussing on the idea first is a good one. I think I'd like to know whether people thought that opt-in or opt-out behaviour would be best?

For me - there are a few cases where automatically prefetching would *maybe* be the wrong thing to do. In the vast majority of cases it'd be better than the default of having nothing.

A few concerns:

- Be careful of `.iterator()` queries (that can't use prefetch)
- Could we warn of reverse M2M/ForeignKey at least?

Adam/Gordon, I'm interested in hearing how these changes led you to discovering stale prefetches?

On Wednesday, 16 August 2017 07:30:10 UTC+10, Adam Johnson wrote:
I'm biased here, Gordon was my boss for nearly three years, 2014-2016.

I'm in favour of adding the auto-prefetching, I've seen it work. It was created around midway through last year and applied to our application's Admin interface, which was covered in tests with django-perf-rec. After adding the automatic prefetch, we not only identified some existing N+1 query problems that hadn't been picked up (they were there in the performance record files, just there were so many queries humans had missed reading some), we also found a number of stale prefetch queries that could be removed because the data they fetched wasn't being used. Additionally not all of the admin interface was perf-rec-tested/optimized so some pages were "just faster" with no extra effort.

I think there's a case for adding it to core - releasing as a third party package would make it difficult to use since it requires changes - mostly small - to QuerySet, ForeignKey, and the descriptor for ForeignKey. Users of such a package would have to use subclasses of all of these, the trickiest being ForeignKey since it triggers migrations... We had the luxury in our codebase of already having these subclasses for other customizations, so it was easier to roll out.

On 15 August 2017 at 20:35, Gordon Wrigley <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="yHLjxmNlCAAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">gordon....@...> wrote:
The warnings you propose would certainly be an improvement on the status quo.
However for that to be a complete solution Django would also need to detect places where there are redundant prefetch_relateds.

Additionally tools like the Admin and DRF would need to provide adequate hooks for inserting these calls.
For example ModelAdmin.get_queryset is not really granular enough as it's used by both the list and detail views which might touch quite different sets of fields. (Although in practice what you generally do is optimize the list view as that's the one that tends to explode)

That aside I sincerely believe that the proposed approach is superior to the current default behavior in the majority of cases and further more doesn't fail as badly as the current behavior when it's not appropriate. I expect that if implemented as an option then in time that belief would prove itself.

On Tue, Aug 15, 2017 at 8:17 PM, Tom Forbes <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="yHLjxmNlCAAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">t...@...> wrote:
Exploding query counts are definitely a pain point in Django, anything to improve that is definitely worth considering. They have been a problem in all Django projects I have seen.

However I think the correct solution is for developers to correctly add select/prefetch calls. There is no general solution for automatically applying them that works for enough cases, and i think adding such a method to querysets would be used incorrectly and too often. 

Perhaps a better solution would be for Django to detect these O(n) query cases and display intelligent warnings, with suggestions as to the correct select/prefetch calls to add. When debug mode is enabled we could detect repeated foreign key referencing from the same source.

On 15 Aug 2017 19:44, "Gordon Wrigley" <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="yHLjxmNlCAAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">gordon....@...> wrote:
Sorry maybe I wasn't clear enough about the proposed mechanism.

Currently when you dereference a foreign key field on an object (so 'choice.question' in the examples above) if it doesn't have the value cached from an earlier access, prefetch_related or select_related then Django will automatically perform a db query to fetch it. After that the value will then be cached on the object for any future dereferences.

This automatic fetching is the source the N+1 query problems and in my experience most gross performance problems in Django apps.

The proposal essentially is to add a new queryset function that says for the group of objects fetched by this queryset, whenever one of these automatic foreign key queries happens on one of them instead of fetching the foreign key for just that one use the prefetch mechanism to fetch it for all of them.
The presumption being that the vast majority of the time when you access a field on one object from a queryset result, probably you are going to access the same field on many of the others as well.

The implementation I've used in production does nest across foreign keys so something (admittedly contrived) like:
for choice in Choice.objects.all():
    
print(choice.question.author)
Will produce 3 queries, one for all choices, one for the questions of those choices and one for the authors of those questions.

It's worth noting that because these are foreign keys in their "to one" direction each of those queryset results will be at most the same size (in rows) as the proceeding one and often (due to nulls and duplicates) smaller.

I do not propose touching reverse foreign key or many2many fields as the generated queries could request substantially more rows from the DB than the original query and it's not at all clear how this mechanism would sanely interact with filtering etc. So this is purely about the forward direction of foreign keys.

I hope that clarifies my thinking some.

Regards
G

On Tue, Aug 15, 2017 at 7:02 PM, Marc Tamlyn <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="yHLjxmNlCAAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">marc....@...> wrote:
Hi Gordon,

Thanks for the suggestion.

I'm not a fan of adding a layer that tries to be this clever. How would possible prefetches be identified? What happens when an initial loop in a view requires one prefetch, but a subsequent loop in a template requires some other prefetch? What about nested loops resulting in nested prefetches? Code like this is almost guaranteed to break unexpectedly in multiple ways. Personally, I would argue that correctly setting up and maintaining appropriate prefetches and selects is a necessary part of working with an ORM.

Do you know of any other ORMs which attempt similar magical optimisations? How do they go about identifying the cases where it is necessary?

On 15 August 2017 at 10:44, Gordon Wrigley <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="yHLjxmNlCAAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">gordon....@...> wrote:
I'd like to discuss automatic prefetching in querysets. Specifically automatically doing prefetch_related where needed without the user having to request it.

For context consider these three snippets using the Question & Choice models from the <a href="https://docs.djangoproject.com/en/1.11/intro/tutorial02/#creating-models" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fdocs.djangoproject.com%2Fen%2F1.11%2Fintro%2Ftutorial02%2F%23creating-models\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHF-kaWCchlNx159s33OFlJX4mp9A&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fdocs.djangoproject.com%2Fen%2F1.11%2Fintro%2Ftutorial02%2F%23creating-models\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHF-kaWCchlNx159s33OFlJX4mp9A&#39;;return true;">tutorial when there are 100 questions each with 5 choices for a total of 500 choices.

Default
for choice in Choice.objects.all():
   
print(choice.question.question_text, ':', choice.choice_text)
501 db queries, fetches 500 choice rows and 500 question rows from the DB

Prefetch_related
for choice in Choice.objects.prefetch_related('question'):
   
print(choice.question.question_text, ':', choice.choice_text)
2 db queries, fetches 500 choice rows and 100 question rows from the DB

Select_related
for choice in Choice.objects.select_related('question'):
   
print(choice.question.question_text, ':', choice.choice_text)
1 db query, fetches 500 choice rows and 500 question rows from the DB

I've included select_related for completeness, I'm not going to propose changing anything about it's use. There are places where it is the best choice and in those places it will still be up to the user to request it. I will note that anywhere select_related is optimal prefetch_related is still better than the default and leave it at that.

The 'Default' example above is a classic example of the N+1 query problem, a problem that is widespread in Django apps.
This pattern of queries is what new users produce because they don't know enough about the database and / or ORM to do otherwise.
Experieced users will also often produce this because it's not always obvious what fields will and won't be used and subsequently what should be prefetched.
Additionally that list will change over time. A small change to a template to display an extra field can result in a denial of service on your DB due to a missing prefetch.
Identifying missing prefetches is fiddly, time consuming and error prone. Tools like <a href="https://github.com/YPlan/django-perf-rec" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fgithub.com%2FYPlan%2Fdjango-perf-rec\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEgakhDw2GUDP6X96h6Fc0TQot45A&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fgithub.com%2FYPlan%2Fdjango-perf-rec\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEgakhDw2GUDP6X96h6Fc0TQot45A&#39;;return true;">django-perf-rec (which I was involved in creating) and <a href="https://github.com/jmcarp/nplusone" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fgithub.com%2Fjmcarp%2Fnplusone\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNH88BgHd_7YWTnFw0EyrpYYjRvwuQ&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fgithub.com%2Fjmcarp%2Fnplusone\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNH88BgHd_7YWTnFw0EyrpYYjRvwuQ&#39;;return true;">nplusone exist in part to flag missing prefetches introduced by changed code.
Finally libraries like Django Rest Framework and the Admin will also produce queries like this because it's very difficult for them to know what needs prefetching without being explicitly told by an experienced user.

As hinted at the top I'd like to propose changing Django so the default code behaves like the prefetch_related code.
Longer term I think this should be the default behaviour but obviously it needs to be proved first so for now I'd suggest a new queryset function that enables this behaviour.

I have a proof of concept of this mechanism that I've used successfully in production. I'm not posting it yet because I'd like to focus on desired behavior rather than implementation details. But in summary, what it does is when accessing a missing field on a model, rather than fetching it just for that instance, it runs a prefetch_related query to fetch it for all peer instances that were fetched in the same queryset. So in the example above it prefetches all Questions in one query.

This might seem like a risky thing to do but I'd argue that it really isn't.
The only time this isn't superior to the default case is when you are post filtering the queryset results in Python.
Even in that case it's only inferior if you started with a large number of results, filtered basically all of them and the code is structured so that the filtered ones aren't garbage collected.
To cover this rare case the automatic prefetching can easily be disabled on a per queryset or per object basis. Leaving us with a rare downside that can easily be manually resolved in exchange for a significant general improvement.

In practice this thing is almost magical to work with. Unless you already have extensive and tightly maintained prefetches everywhere you get an immediate boost to virtually everything that touches the database, often knocking orders of magnitude off page load times.

If an agreement can be reached on pursuing this then I'm happy to put in the work to productize the proof of concept.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="yHLjxmNlCAAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">django-develop...@googlegroups.com.
To post to this group, send email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="yHLjxmNlCAAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">django-d...@googlegroups.com.
Visit this group at <a href="https://groups.google.com/group/django-developers" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://groups.google.com/group/django-developers&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/group/django-developers&#39;;return true;">https://groups.google.com/group/django-developers.
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/django-developers/d402bf30-a5af-4072-8b50-85e921f7f9af%40googlegroups.com?utm_medium=email&amp;utm_source=footer" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://groups.google.com/d/msgid/django-developers/d402bf30-a5af-4072-8b50-85e921f7f9af%40googlegroups.com?utm_medium\x3demail\x26utm_source\x3dfooter&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/msgid/django-developers/d402bf30-a5af-4072-8b50-85e921f7f9af%40googlegroups.com?utm_medium\x3demail\x26utm_source\x3dfooter&#39;;return true;">https://groups.google.com/d/msgid/django-developers/d402bf30-a5af-4072-8b50-85e921f7f9af%40googlegroups.com.
For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;">https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this topic, visit <a href="https://groups.google.com/d/topic/django-developers/EplZGj-ejvg/unsubscribe" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://groups.google.com/d/topic/django-developers/EplZGj-ejvg/unsubscribe&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/topic/django-developers/EplZGj-ejvg/unsubscribe&#39;;return true;">https://groups.google.com/d/topic/django-developers/EplZGj-ejvg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="yHLjxmNlCAAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">django-develop...@googlegroups.com.

To post to this group, send email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="yHLjxmNlCAAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">django-d...@googlegroups.com.
Visit this group at <a href="https://groups.google.com/group/django-developers" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://groups.google.com/group/django-developers&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/group/django-developers&#39;;return true;">https://groups.google.com/group/django-developers.
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/django-developers/CAMwjO1Gaha-K7KkefJkiS3LRdXvaPPwBeuKmhQv6bJFx3dty3w%40mail.gmail.com?utm_medium=email&amp;utm_source=footer" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://groups.google.com/d/msgid/django-developers/CAMwjO1Gaha-K7KkefJkiS3LRdXvaPPwBeuKmhQv6bJFx3dty3w%40mail.gmail.com?utm_medium\x3demail\x26utm_source\x3dfooter&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/msgid/django-developers/CAMwjO1Gaha-K7KkefJkiS3LRdXvaPPwBeuKmhQv6bJFx3dty3w%40mail.gmail.com?utm_medium\x3demail\x26utm_source\x3dfooter&#39;;return true;">https://groups.google.com/d/msgid/django-developers/CAMwjO1Gaha-K7KkefJkiS3LRdXvaPPwBeuKmhQv6bJFx3dty3w%40mail.gmail.com.

For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;">https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="yHLjxmNlCAAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">django-develop...@googlegroups.com.
To post to this group, send email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="yHLjxmNlCAAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">django-d...@googlegroups.com.
Visit this group at <a href="https://groups.google.com/group/django-developers" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://groups.google.com/group/django-developers&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/group/django-developers&#39;;return true;">https://groups.google.com/group/django-developers.
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/django-developers/CAD-wiX3Xa%3DN-D95RPGo8%3D3kN0zunuAOw-SpYUa4g_zsk63bARQ%40mail.gmail.com?utm_medium=email&amp;utm_source=footer" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://groups.google.com/d/msgid/django-developers/CAD-wiX3Xa%3DN-D95RPGo8%3D3kN0zunuAOw-SpYUa4g_zsk63bARQ%40mail.gmail.com?utm_medium\x3demail\x26utm_source\x3dfooter&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/msgid/django-developers/CAD-wiX3Xa%3DN-D95RPGo8%3D3kN0zunuAOw-SpYUa4g_zsk63bARQ%40mail.gmail.com?utm_medium\x3demail\x26utm_source\x3dfooter&#39;;return true;">https://groups.google.com/d/msgid/django-developers/CAD-wiX3Xa%3DN-D95RPGo8%3D3kN0zunuAOw-SpYUa4g_zsk63bARQ%40mail.gmail.com.

For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;">https://groups.google.com/d/optout.


--
You received this message because you are subscribed to a topic in the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this topic, visit <a href="https://groups.google.com/d/topic/django-developers/EplZGj-ejvg/unsubscribe" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://groups.google.com/d/topic/django-developers/EplZGj-ejvg/unsubscribe&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/topic/django-developers/EplZGj-ejvg/unsubscribe&#39;;return true;">https://groups.google.com/d/topic/django-developers/EplZGj-ejvg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="yHLjxmNlCAAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">django-develop...@googlegroups.com.
To post to this group, send email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="yHLjxmNlCAAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">django-d...@googlegroups.com.
Visit this group at <a href="https://groups.google.com/group/django-developers" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://groups.google.com/group/django-developers&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/group/django-developers&#39;;return true;">https://groups.google.com/group/django-developers.
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/django-developers/CAFNZOJO5LEf_i%2BqG2KFUOrbTXG-yanubzjFvC1mqU-B0GGG9ng%40mail.gmail.com?utm_medium=email&amp;utm_source=footer" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://groups.google.com/d/msgid/django-developers/CAFNZOJO5LEf_i%2BqG2KFUOrbTXG-yanubzjFvC1mqU-B0GGG9ng%40mail.gmail.com?utm_medium\x3demail\x26utm_source\x3dfooter&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/msgid/django-developers/CAFNZOJO5LEf_i%2BqG2KFUOrbTXG-yanubzjFvC1mqU-B0GGG9ng%40mail.gmail.com?utm_medium\x3demail\x26utm_source\x3dfooter&#39;;return true;">https://groups.google.com/d/msgid/django-developers/CAFNZOJO5LEf_i%2BqG2KFUOrbTXG-yanubzjFvC1mqU-B0GGG9ng%40mail.gmail.com.

For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;">https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="yHLjxmNlCAAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">django-develop...@googlegroups.com.
To post to this group, send email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="yHLjxmNlCAAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">django-d...@googlegroups.com.
Visit this group at <a href="https://groups.google.com/group/django-developers" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://groups.google.com/group/django-developers&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/group/django-developers&#39;;return true;">https://groups.google.com/group/django-developers.
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/django-developers/CAD-wiX22Fn_qyvEcnLHEsPoKyvxGsrLXiGXvP%3Dz5%2BoX9W-NnNg%40mail.gmail.com?utm_medium=email&amp;utm_source=footer" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://groups.google.com/d/msgid/django-developers/CAD-wiX22Fn_qyvEcnLHEsPoKyvxGsrLXiGXvP%3Dz5%2BoX9W-NnNg%40mail.gmail.com?utm_medium\x3demail\x26utm_source\x3dfooter&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/msgid/django-developers/CAD-wiX22Fn_qyvEcnLHEsPoKyvxGsrLXiGXvP%3Dz5%2BoX9W-NnNg%40mail.gmail.com?utm_medium\x3demail\x26utm_source\x3dfooter&#39;;return true;">https://groups.google.com/d/msgid/django-developers/CAD-wiX22Fn_qyvEcnLHEsPoKyvxGsrLXiGXvP%3Dz5%2BoX9W-NnNg%40mail.gmail.com.

For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;">https://groups.google.com/d/optout.



--
Adam

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/3fbababd-0324-4b14-a40e-83f72c4f945c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Automatic prefetching in querysets

Luke Plant-2
In reply to this post by Marc Tamlyn

I agree with Marc here that the proposed optimizations are 'magical'. I think when it comes to optimizations like these you simply cannot know in advance whether doing extra queries is going to a be an optimization or a pessimization. If I can come up with a single example where it would significantly decrease performance (either memory usage or speed) compared to the default (and I'm sure I can), then I would be strongly opposed to it ever being default behaviour.

Concerning implementing it as an additional  QuerySet method like `auto_prefetch()` - I'm not sure what I think, I feel like it could get icky (i.e. increase our technical debt), due to the way it couples things together. I can't imagine ever wanting to use it, though, I would always prefer the manual option.

Luke



On 15/08/17 21:02, Marc Tamlyn wrote:
Hi Gordon,

Thanks for the suggestion.

I'm not a fan of adding a layer that tries to be this clever. How would possible prefetches be identified? What happens when an initial loop in a view requires one prefetch, but a subsequent loop in a template requires some other prefetch? What about nested loops resulting in nested prefetches? Code like this is almost guaranteed to break unexpectedly in multiple ways. Personally, I would argue that correctly setting up and maintaining appropriate prefetches and selects is a necessary part of working with an ORM.

Do you know of any other ORMs which attempt similar magical optimisations? How do they go about identifying the cases where it is necessary?

On 15 August 2017 at 10:44, Gordon Wrigley <[hidden email]> wrote:
I'd like to discuss automatic prefetching in querysets. Specifically automatically doing prefetch_related where needed without the user having to request it.

For context consider these three snippets using the Question & Choice models from the tutorial when there are 100 questions each with 5 choices for a total of 500 choices.

Default
for choice in Choice.objects.all():
   
print(choice.question.question_text, ':', choice.choice_text)
501 db queries, fetches 500 choice rows and 500 question rows from the DB

Prefetch_related
for choice in Choice.objects.prefetch_related('question'):
   
print(choice.question.question_text, ':', choice.choice_text)
2 db queries, fetches 500 choice rows and 100 question rows from the DB

Select_related
for choice in Choice.objects.select_related('question'):
   
print(choice.question.question_text, ':', choice.choice_text)
1 db query, fetches 500 choice rows and 500 question rows from the DB

I've included select_related for completeness, I'm not going to propose changing anything about it's use. There are places where it is the best choice and in those places it will still be up to the user to request it. I will note that anywhere select_related is optimal prefetch_related is still better than the default and leave it at that.

The 'Default' example above is a classic example of the N+1 query problem, a problem that is widespread in Django apps.
This pattern of queries is what new users produce because they don't know enough about the database and / or ORM to do otherwise.
Experieced users will also often produce this because it's not always obvious what fields will and won't be used and subsequently what should be prefetched.
Additionally that list will change over time. A small change to a template to display an extra field can result in a denial of service on your DB due to a missing prefetch.
Identifying missing prefetches is fiddly, time consuming and error prone. Tools like django-perf-rec (which I was involved in creating) and nplusone exist in part to flag missing prefetches introduced by changed code.
Finally libraries like Django Rest Framework and the Admin will also produce queries like this because it's very difficult for them to know what needs prefetching without being explicitly told by an experienced user.

As hinted at the top I'd like to propose changing Django so the default code behaves like the prefetch_related code.
Longer term I think this should be the default behaviour but obviously it needs to be proved first so for now I'd suggest a new queryset function that enables this behaviour.

I have a proof of concept of this mechanism that I've used successfully in production. I'm not posting it yet because I'd like to focus on desired behavior rather than implementation details. But in summary, what it does is when accessing a missing field on a model, rather than fetching it just for that instance, it runs a prefetch_related query to fetch it for all peer instances that were fetched in the same queryset. So in the example above it prefetches all Questions in one query.

This might seem like a risky thing to do but I'd argue that it really isn't.
The only time this isn't superior to the default case is when you are post filtering the queryset results in Python.
Even in that case it's only inferior if you started with a large number of results, filtered basically all of them and the code is structured so that the filtered ones aren't garbage collected.
To cover this rare case the automatic prefetching can easily be disabled on a per queryset or per object basis. Leaving us with a rare downside that can easily be manually resolved in exchange for a significant general improvement.

In practice this thing is almost magical to work with. Unless you already have extensive and tightly maintained prefetches everywhere you get an immediate boost to virtually everything that touches the database, often knocking orders of magnitude off page load times.

If an agreement can be reached on pursuing this then I'm happy to put in the work to productize the proof of concept.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/d402bf30-a5af-4072-8b50-85e921f7f9af%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CAMwjO1Gaha-K7KkefJkiS3LRdXvaPPwBeuKmhQv6bJFx3dty3w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/a5780df6-ce60-05ae-88e3-997e6bc88f5c%40cantab.net.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Automatic prefetching in querysets

Sean Brant
I wonder if a solution similar to [1] from the rails world would satisfy this request. Rather then doing anything 'magical' we instead log when we detect things like accessing a related model that has not been pre-fetched.


On Tue, Aug 15, 2017 at 5:14 PM, Luke Plant <[hidden email]> wrote:

I agree with Marc here that the proposed optimizations are 'magical'. I think when it comes to optimizations like these you simply cannot know in advance whether doing extra queries is going to a be an optimization or a pessimization. If I can come up with a single example where it would significantly decrease performance (either memory usage or speed) compared to the default (and I'm sure I can), then I would be strongly opposed to it ever being default behaviour.

Concerning implementing it as an additional  QuerySet method like `auto_prefetch()` - I'm not sure what I think, I feel like it could get icky (i.e. increase our technical debt), due to the way it couples things together. I can't imagine ever wanting to use it, though, I would always prefer the manual option.

Luke



On 15/08/17 21:02, Marc Tamlyn wrote:
Hi Gordon,

Thanks for the suggestion.

I'm not a fan of adding a layer that tries to be this clever. How would possible prefetches be identified? What happens when an initial loop in a view requires one prefetch, but a subsequent loop in a template requires some other prefetch? What about nested loops resulting in nested prefetches? Code like this is almost guaranteed to break unexpectedly in multiple ways. Personally, I would argue that correctly setting up and maintaining appropriate prefetches and selects is a necessary part of working with an ORM.

Do you know of any other ORMs which attempt similar magical optimisations? How do they go about identifying the cases where it is necessary?

On 15 August 2017 at 10:44, Gordon Wrigley <[hidden email]> wrote:
I'd like to discuss automatic prefetching in querysets. Specifically automatically doing prefetch_related where needed without the user having to request it.

For context consider these three snippets using the Question & Choice models from the tutorial when there are 100 questions each with 5 choices for a total of 500 choices.

Default
for choice in Choice.objects.all():
   
print(choice.question.question_text, ':', choice.choice_text)
501 db queries, fetches 500 choice rows and 500 question rows from the DB

Prefetch_related
for choice in Choice.objects.prefetch_related('question'):
   
print(choice.question.question_text, ':', choice.choice_text)
2 db queries, fetches 500 choice rows and 100 question rows from the DB

Select_related
for choice in Choice.objects.select_related('question'):
   
print(choice.question.question_text, ':', choice.choice_text)
1 db query, fetches 500 choice rows and 500 question rows from the DB

I've included select_related for completeness, I'm not going to propose changing anything about it's use. There are places where it is the best choice and in those places it will still be up to the user to request it. I will note that anywhere select_related is optimal prefetch_related is still better than the default and leave it at that.

The 'Default' example above is a classic example of the N+1 query problem, a problem that is widespread in Django apps.
This pattern of queries is what new users produce because they don't know enough about the database and / or ORM to do otherwise.
Experieced users will also often produce this because it's not always obvious what fields will and won't be used and subsequently what should be prefetched.
Additionally that list will change over time. A small change to a template to display an extra field can result in a denial of service on your DB due to a missing prefetch.
Identifying missing prefetches is fiddly, time consuming and error prone. Tools like django-perf-rec (which I was involved in creating) and nplusone exist in part to flag missing prefetches introduced by changed code.
Finally libraries like Django Rest Framework and the Admin will also produce queries like this because it's very difficult for them to know what needs prefetching without being explicitly told by an experienced user.

As hinted at the top I'd like to propose changing Django so the default code behaves like the prefetch_related code.
Longer term I think this should be the default behaviour but obviously it needs to be proved first so for now I'd suggest a new queryset function that enables this behaviour.

I have a proof of concept of this mechanism that I've used successfully in production. I'm not posting it yet because I'd like to focus on desired behavior rather than implementation details. But in summary, what it does is when accessing a missing field on a model, rather than fetching it just for that instance, it runs a prefetch_related query to fetch it for all peer instances that were fetched in the same queryset. So in the example above it prefetches all Questions in one query.

This might seem like a risky thing to do but I'd argue that it really isn't.
The only time this isn't superior to the default case is when you are post filtering the queryset results in Python.
Even in that case it's only inferior if you started with a large number of results, filtered basically all of them and the code is structured so that the filtered ones aren't garbage collected.
To cover this rare case the automatic prefetching can easily be disabled on a per queryset or per object basis. Leaving us with a rare downside that can easily be manually resolved in exchange for a significant general improvement.

In practice this thing is almost magical to work with. Unless you already have extensive and tightly maintained prefetches everywhere you get an immediate boost to virtually everything that touches the database, often knocking orders of magnitude off page load times.

If an agreement can be reached on pursuing this then I'm happy to put in the work to productize the proof of concept.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/d402bf30-a5af-4072-8b50-85e921f7f9af%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CAMwjO1Gaha-K7KkefJkiS3LRdXvaPPwBeuKmhQv6bJFx3dty3w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/a5780df6-ce60-05ae-88e3-997e6bc88f5c%40cantab.net.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CAPNuhQzwCFB3WoaTtscdO%3D2CwkCtUVsSTQqFceDO9TFPLy54cw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Automatic prefetching in querysets

Adam Johnson-2
In reply to this post by Josh Smeaton
Adam/Gordon, I'm interested in hearing how these changes led you to discovering stale prefetches?

We removed all the manual prefetches in admin get_queryset() methods, added the auto_prefetch_related call, and regenerated the performance records from django-perf-rec tests that hit the admin classes on the list view, detail view, etc. Some queries disappeared and on inspection it was obvious they were for features since removed e.g. a related field no longer shown on the list view.

On 15 August 2017 at 23:05, Josh Smeaton <[hidden email]> wrote:
I'm in favour of *some* automated way of handling these prefetches, whether it's opt-in or opt-out there should be some mechanism for protection. Preferably with loud logging that directs users to the source of the automated hand-holding so they have an opportunity to disable or fine tune the queries. Not all Django devs are ORM/Database/Python experts - some are frontend devs just trying to get by. I know that this kind of proposed behaviour would have both saved our site from massive performance issues, but also likely guided these same devs to the source of potential issues.

Agreed that prefetch_related is strictly better than not, and is acceptable in the absence of select_related.

I think keeping the code back and focussing on the idea first is a good one. I think I'd like to know whether people thought that opt-in or opt-out behaviour would be best?

For me - there are a few cases where automatically prefetching would *maybe* be the wrong thing to do. In the vast majority of cases it'd be better than the default of having nothing.

A few concerns:

- Be careful of `.iterator()` queries (that can't use prefetch)
- Could we warn of reverse M2M/ForeignKey at least?

Adam/Gordon, I'm interested in hearing how these changes led you to discovering stale prefetches?

On Wednesday, 16 August 2017 07:30:10 UTC+10, Adam Johnson wrote:
I'm biased here, Gordon was my boss for nearly three years, 2014-2016.

I'm in favour of adding the auto-prefetching, I've seen it work. It was created around midway through last year and applied to our application's Admin interface, which was covered in tests with django-perf-rec. After adding the automatic prefetch, we not only identified some existing N+1 query problems that hadn't been picked up (they were there in the performance record files, just there were so many queries humans had missed reading some), we also found a number of stale prefetch queries that could be removed because the data they fetched wasn't being used. Additionally not all of the admin interface was perf-rec-tested/optimized so some pages were "just faster" with no extra effort.

I think there's a case for adding it to core - releasing as a third party package would make it difficult to use since it requires changes - mostly small - to QuerySet, ForeignKey, and the descriptor for ForeignKey. Users of such a package would have to use subclasses of all of these, the trickiest being ForeignKey since it triggers migrations... We had the luxury in our codebase of already having these subclasses for other customizations, so it was easier to roll out.

On 15 August 2017 at 20:35, Gordon Wrigley <[hidden email]> wrote:
The warnings you propose would certainly be an improvement on the status quo.
However for that to be a complete solution Django would also need to detect places where there are redundant prefetch_relateds.

Additionally tools like the Admin and DRF would need to provide adequate hooks for inserting these calls.
For example ModelAdmin.get_queryset is not really granular enough as it's used by both the list and detail views which might touch quite different sets of fields. (Although in practice what you generally do is optimize the list view as that's the one that tends to explode)

That aside I sincerely believe that the proposed approach is superior to the current default behavior in the majority of cases and further more doesn't fail as badly as the current behavior when it's not appropriate. I expect that if implemented as an option then in time that belief would prove itself.

On Tue, Aug 15, 2017 at 8:17 PM, Tom Forbes <[hidden email]> wrote:
Exploding query counts are definitely a pain point in Django, anything to improve that is definitely worth considering. They have been a problem in all Django projects I have seen.

However I think the correct solution is for developers to correctly add select/prefetch calls. There is no general solution for automatically applying them that works for enough cases, and i think adding such a method to querysets would be used incorrectly and too often. 

Perhaps a better solution would be for Django to detect these O(n) query cases and display intelligent warnings, with suggestions as to the correct select/prefetch calls to add. When debug mode is enabled we could detect repeated foreign key referencing from the same source.

On 15 Aug 2017 19:44, "Gordon Wrigley" <[hidden email]> wrote:
Sorry maybe I wasn't clear enough about the proposed mechanism.

Currently when you dereference a foreign key field on an object (so 'choice.question' in the examples above) if it doesn't have the value cached from an earlier access, prefetch_related or select_related then Django will automatically perform a db query to fetch it. After that the value will then be cached on the object for any future dereferences.

This automatic fetching is the source the N+1 query problems and in my experience most gross performance problems in Django apps.

The proposal essentially is to add a new queryset function that says for the group of objects fetched by this queryset, whenever one of these automatic foreign key queries happens on one of them instead of fetching the foreign key for just that one use the prefetch mechanism to fetch it for all of them.
The presumption being that the vast majority of the time when you access a field on one object from a queryset result, probably you are going to access the same field on many of the others as well.

The implementation I've used in production does nest across foreign keys so something (admittedly contrived) like:
for choice in Choice.objects.all():
    
print(choice.question.author)
Will produce 3 queries, one for all choices, one for the questions of those choices and one for the authors of those questions.

It's worth noting that because these are foreign keys in their "to one" direction each of those queryset results will be at most the same size (in rows) as the proceeding one and often (due to nulls and duplicates) smaller.

I do not propose touching reverse foreign key or many2many fields as the generated queries could request substantially more rows from the DB than the original query and it's not at all clear how this mechanism would sanely interact with filtering etc. So this is purely about the forward direction of foreign keys.

I hope that clarifies my thinking some.

Regards
G

On Tue, Aug 15, 2017 at 7:02 PM, Marc Tamlyn <[hidden email]> wrote:
Hi Gordon,

Thanks for the suggestion.

I'm not a fan of adding a layer that tries to be this clever. How would possible prefetches be identified? What happens when an initial loop in a view requires one prefetch, but a subsequent loop in a template requires some other prefetch? What about nested loops resulting in nested prefetches? Code like this is almost guaranteed to break unexpectedly in multiple ways. Personally, I would argue that correctly setting up and maintaining appropriate prefetches and selects is a necessary part of working with an ORM.

Do you know of any other ORMs which attempt similar magical optimisations? How do they go about identifying the cases where it is necessary?

On 15 August 2017 at 10:44, Gordon Wrigley <[hidden email]> wrote:
I'd like to discuss automatic prefetching in querysets. Specifically automatically doing prefetch_related where needed without the user having to request it.

For context consider these three snippets using the Question & Choice models from the tutorial when there are 100 questions each with 5 choices for a total of 500 choices.

Default
for choice in Choice.objects.all():
   
print(choice.question.question_text, ':', choice.choice_text)
501 db queries, fetches 500 choice rows and 500 question rows from the DB

Prefetch_related
for choice in Choice.objects.prefetch_related('question'):
   
print(choice.question.question_text, ':', choice.choice_text)
2 db queries, fetches 500 choice rows and 100 question rows from the DB

Select_related
for choice in Choice.objects.select_related('question'):
   
print(choice.question.question_text, ':', choice.choice_text)
1 db query, fetches 500 choice rows and 500 question rows from the DB

I've included select_related for completeness, I'm not going to propose changing anything about it's use. There are places where it is the best choice and in those places it will still be up to the user to request it. I will note that anywhere select_related is optimal prefetch_related is still better than the default and leave it at that.

The 'Default' example above is a classic example of the N+1 query problem, a problem that is widespread in Django apps.
This pattern of queries is what new users produce because they don't know enough about the database and / or ORM to do otherwise.
Experieced users will also often produce this because it's not always obvious what fields will and won't be used and subsequently what should be prefetched.
Additionally that list will change over time. A small change to a template to display an extra field can result in a denial of service on your DB due to a missing prefetch.
Identifying missing prefetches is fiddly, time consuming and error prone. Tools like django-perf-rec (which I was involved in creating) and nplusone exist in part to flag missing prefetches introduced by changed code.
Finally libraries like Django Rest Framework and the Admin will also produce queries like this because it's very difficult for them to know what needs prefetching without being explicitly told by an experienced user.

As hinted at the top I'd like to propose changing Django so the default code behaves like the prefetch_related code.
Longer term I think this should be the default behaviour but obviously it needs to be proved first so for now I'd suggest a new queryset function that enables this behaviour.

I have a proof of concept of this mechanism that I've used successfully in production. I'm not posting it yet because I'd like to focus on desired behavior rather than implementation details. But in summary, what it does is when accessing a missing field on a model, rather than fetching it just for that instance, it runs a prefetch_related query to fetch it for all peer instances that were fetched in the same queryset. So in the example above it prefetches all Questions in one query.

This might seem like a risky thing to do but I'd argue that it really isn't.
The only time this isn't superior to the default case is when you are post filtering the queryset results in Python.
Even in that case it's only inferior if you started with a large number of results, filtered basically all of them and the code is structured so that the filtered ones aren't garbage collected.
To cover this rare case the automatic prefetching can easily be disabled on a per queryset or per object basis. Leaving us with a rare downside that can easily be manually resolved in exchange for a significant general improvement.

In practice this thing is almost magical to work with. Unless you already have extensive and tightly maintained prefetches everywhere you get an immediate boost to virtually everything that touches the database, often knocking orders of magnitude off page load times.

If an agreement can be reached on pursuing this then I'm happy to put in the work to productize the proof of concept.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/d402bf30-a5af-4072-8b50-85e921f7f9af%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/django-developers/EplZGj-ejvg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to django-develop...@googlegroups.com.

To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CAD-wiX3Xa%3DN-D95RPGo8%3D3kN0zunuAOw-SpYUa4g_zsk63bARQ%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.


--
You received this message because you are subscribed to a topic in the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/django-developers/EplZGj-ejvg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to django-develop...@googlegroups.com.
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CAFNZOJO5LEf_i%2BqG2KFUOrbTXG-yanubzjFvC1mqU-B0GGG9ng%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CAD-wiX22Fn_qyvEcnLHEsPoKyvxGsrLXiGXvP%3Dz5%2BoX9W-NnNg%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.



--
Adam

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/3fbababd-0324-4b14-a40e-83f72c4f945c%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Adam

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CAMyDDM1WG58bRJ%3Dt5hBTZzwRqa_ABXfoMO_TP3DyyUomz8b7cA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Automatic prefetching in querysets

Adam Johnson-2
I agree with Marc here that the proposed optimizations are 'magical'. I think when it comes to optimizations like these you simply cannot know in advance whether doing extra queries is going to a be an optimization or a pessimization.

I think Django automatically fetching data from the database on access is already magical, it certainly surprises new users (in good and bad ways) when they first come across it, especially if they've only worked with SQL before and not any other ORM's.

This is not doing "extra" queries, just fetching a bit more data. In the most likely case, there are far fewer queries - N+1 will instead become 2. I think it's a reasonable assumption that if choices_queryset[0].question is accessed, choices_queryset[1].question up to choices_queryset[n].question will be accessed too. The current behaviour is effectively assuming that it's completely unlikely.

On 15 August 2017 at 23:24, Adam Johnson <[hidden email]> wrote:
Adam/Gordon, I'm interested in hearing how these changes led you to discovering stale prefetches?

We removed all the manual prefetches in admin get_queryset() methods, added the auto_prefetch_related call, and regenerated the performance records from django-perf-rec tests that hit the admin classes on the list view, detail view, etc. Some queries disappeared and on inspection it was obvious they were for features since removed e.g. a related field no longer shown on the list view.

On 15 August 2017 at 23:05, Josh Smeaton <[hidden email]> wrote:
I'm in favour of *some* automated way of handling these prefetches, whether it's opt-in or opt-out there should be some mechanism for protection. Preferably with loud logging that directs users to the source of the automated hand-holding so they have an opportunity to disable or fine tune the queries. Not all Django devs are ORM/Database/Python experts - some are frontend devs just trying to get by. I know that this kind of proposed behaviour would have both saved our site from massive performance issues, but also likely guided these same devs to the source of potential issues.

Agreed that prefetch_related is strictly better than not, and is acceptable in the absence of select_related.

I think keeping the code back and focussing on the idea first is a good one. I think I'd like to know whether people thought that opt-in or opt-out behaviour would be best?

For me - there are a few cases where automatically prefetching would *maybe* be the wrong thing to do. In the vast majority of cases it'd be better than the default of having nothing.

A few concerns:

- Be careful of `.iterator()` queries (that can't use prefetch)
- Could we warn of reverse M2M/ForeignKey at least?

Adam/Gordon, I'm interested in hearing how these changes led you to discovering stale prefetches?

On Wednesday, 16 August 2017 07:30:10 UTC+10, Adam Johnson wrote:
I'm biased here, Gordon was my boss for nearly three years, 2014-2016.

I'm in favour of adding the auto-prefetching, I've seen it work. It was created around midway through last year and applied to our application's Admin interface, which was covered in tests with django-perf-rec. After adding the automatic prefetch, we not only identified some existing N+1 query problems that hadn't been picked up (they were there in the performance record files, just there were so many queries humans had missed reading some), we also found a number of stale prefetch queries that could be removed because the data they fetched wasn't being used. Additionally not all of the admin interface was perf-rec-tested/optimized so some pages were "just faster" with no extra effort.

I think there's a case for adding it to core - releasing as a third party package would make it difficult to use since it requires changes - mostly small - to QuerySet, ForeignKey, and the descriptor for ForeignKey. Users of such a package would have to use subclasses of all of these, the trickiest being ForeignKey since it triggers migrations... We had the luxury in our codebase of already having these subclasses for other customizations, so it was easier to roll out.

On 15 August 2017 at 20:35, Gordon Wrigley <[hidden email]> wrote:
The warnings you propose would certainly be an improvement on the status quo.
However for that to be a complete solution Django would also need to detect places where there are redundant prefetch_relateds.

Additionally tools like the Admin and DRF would need to provide adequate hooks for inserting these calls.
For example ModelAdmin.get_queryset is not really granular enough as it's used by both the list and detail views which might touch quite different sets of fields. (Although in practice what you generally do is optimize the list view as that's the one that tends to explode)

That aside I sincerely believe that the proposed approach is superior to the current default behavior in the majority of cases and further more doesn't fail as badly as the current behavior when it's not appropriate. I expect that if implemented as an option then in time that belief would prove itself.

On Tue, Aug 15, 2017 at 8:17 PM, Tom Forbes <[hidden email]> wrote:
Exploding query counts are definitely a pain point in Django, anything to improve that is definitely worth considering. They have been a problem in all Django projects I have seen.

However I think the correct solution is for developers to correctly add select/prefetch calls. There is no general solution for automatically applying them that works for enough cases, and i think adding such a method to querysets would be used incorrectly and too often. 

Perhaps a better solution would be for Django to detect these O(n) query cases and display intelligent warnings, with suggestions as to the correct select/prefetch calls to add. When debug mode is enabled we could detect repeated foreign key referencing from the same source.

On 15 Aug 2017 19:44, "Gordon Wrigley" <[hidden email]> wrote:
Sorry maybe I wasn't clear enough about the proposed mechanism.

Currently when you dereference a foreign key field on an object (so 'choice.question' in the examples above) if it doesn't have the value cached from an earlier access, prefetch_related or select_related then Django will automatically perform a db query to fetch it. After that the value will then be cached on the object for any future dereferences.

This automatic fetching is the source the N+1 query problems and in my experience most gross performance problems in Django apps.

The proposal essentially is to add a new queryset function that says for the group of objects fetched by this queryset, whenever one of these automatic foreign key queries happens on one of them instead of fetching the foreign key for just that one use the prefetch mechanism to fetch it for all of them.
The presumption being that the vast majority of the time when you access a field on one object from a queryset result, probably you are going to access the same field on many of the others as well.

The implementation I've used in production does nest across foreign keys so something (admittedly contrived) like:
for choice in Choice.objects.all():
    
print(choice.question.author)
Will produce 3 queries, one for all choices, one for the questions of those choices and one for the authors of those questions.

It's worth noting that because these are foreign keys in their "to one" direction each of those queryset results will be at most the same size (in rows) as the proceeding one and often (due to nulls and duplicates) smaller.

I do not propose touching reverse foreign key or many2many fields as the generated queries could request substantially more rows from the DB than the original query and it's not at all clear how this mechanism would sanely interact with filtering etc. So this is purely about the forward direction of foreign keys.

I hope that clarifies my thinking some.

Regards
G

On Tue, Aug 15, 2017 at 7:02 PM, Marc Tamlyn <[hidden email]> wrote:
Hi Gordon,

Thanks for the suggestion.

I'm not a fan of adding a layer that tries to be this clever. How would possible prefetches be identified? What happens when an initial loop in a view requires one prefetch, but a subsequent loop in a template requires some other prefetch? What about nested loops resulting in nested prefetches? Code like this is almost guaranteed to break unexpectedly in multiple ways. Personally, I would argue that correctly setting up and maintaining appropriate prefetches and selects is a necessary part of working with an ORM.

Do you know of any other ORMs which attempt similar magical optimisations? How do they go about identifying the cases where it is necessary?

On 15 August 2017 at 10:44, Gordon Wrigley <[hidden email]> wrote:
I'd like to discuss automatic prefetching in querysets. Specifically automatically doing prefetch_related where needed without the user having to request it.

For context consider these three snippets using the Question & Choice models from the tutorial when there are 100 questions each with 5 choices for a total of 500 choices.

Default
for choice in Choice.objects.all():
   
print(choice.question.question_text, ':', choice.choice_text)
501 db queries, fetches 500 choice rows and 500 question rows from the DB

Prefetch_related
for choice in Choice.objects.prefetch_related('question'):
   
print(choice.question.question_text, ':', choice.choice_text)
2 db queries, fetches 500 choice rows and 100 question rows from the DB

Select_related
for choice in Choice.objects.select_related('question'):
   
print(choice.question.question_text, ':', choice.choice_text)
1 db query, fetches 500 choice rows and 500 question rows from the DB

I've included select_related for completeness, I'm not going to propose changing anything about it's use. There are places where it is the best choice and in those places it will still be up to the user to request it. I will note that anywhere select_related is optimal prefetch_related is still better than the default and leave it at that.

The 'Default' example above is a classic example of the N+1 query problem, a problem that is widespread in Django apps.
This pattern of queries is what new users produce because they don't know enough about the database and / or ORM to do otherwise.
Experieced users will also often produce this because it's not always obvious what fields will and won't be used and subsequently what should be prefetched.
Additionally that list will change over time. A small change to a template to display an extra field can result in a denial of service on your DB due to a missing prefetch.
Identifying missing prefetches is fiddly, time consuming and error prone. Tools like django-perf-rec (which I was involved in creating) and nplusone exist in part to flag missing prefetches introduced by changed code.
Finally libraries like Django Rest Framework and the Admin will also produce queries like this because it's very difficult for them to know what needs prefetching without being explicitly told by an experienced user.

As hinted at the top I'd like to propose changing Django so the default code behaves like the prefetch_related code.
Longer term I think this should be the default behaviour but obviously it needs to be proved first so for now I'd suggest a new queryset function that enables this behaviour.

I have a proof of concept of this mechanism that I've used successfully in production. I'm not posting it yet because I'd like to focus on desired behavior rather than implementation details. But in summary, what it does is when accessing a missing field on a model, rather than fetching it just for that instance, it runs a prefetch_related query to fetch it for all peer instances that were fetched in the same queryset. So in the example above it prefetches all Questions in one query.

This might seem like a risky thing to do but I'd argue that it really isn't.
The only time this isn't superior to the default case is when you are post filtering the queryset results in Python.
Even in that case it's only inferior if you started with a large number of results, filtered basically all of them and the code is structured so that the filtered ones aren't garbage collected.
To cover this rare case the automatic prefetching can easily be disabled on a per queryset or per object basis. Leaving us with a rare downside that can easily be manually resolved in exchange for a significant general improvement.

In practice this thing is almost magical to work with. Unless you already have extensive and tightly maintained prefetches everywhere you get an immediate boost to virtually everything that touches the database, often knocking orders of magnitude off page load times.

If an agreement can be reached on pursuing this then I'm happy to put in the work to productize the proof of concept.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/d402bf30-a5af-4072-8b50-85e921f7f9af%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/django-developers/EplZGj-ejvg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to django-develop...@googlegroups.com.

To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CAD-wiX3Xa%3DN-D95RPGo8%3D3kN0zunuAOw-SpYUa4g_zsk63bARQ%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.


--
You received this message because you are subscribed to a topic in the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/django-developers/EplZGj-ejvg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to django-develop...@googlegroups.com.
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CAFNZOJO5LEf_i%2BqG2KFUOrbTXG-yanubzjFvC1mqU-B0GGG9ng%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CAD-wiX22Fn_qyvEcnLHEsPoKyvxGsrLXiGXvP%3Dz5%2BoX9W-NnNg%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.



--
Adam

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/3fbababd-0324-4b14-a40e-83f72c4f945c%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Adam



--
Adam

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CAMyDDM1rQqEG2xH8SBifiJ19dOTVPwa6ztQ-wE8aYjBRg-WPkA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Automatic prefetching in querysets

Anthony King
In reply to this post by Luke Plant-2
Automatically prefetching is something I feel should be avoided. 

A common gripe I have with ORMs is they hide what's actually happening with the database, resulting in beginners-going-on-intermediates building libraries/systems that don't scale well. 

We have several views in a dashboard, where a relation may be accessed once or twice while iterating over a large python filtered queryset. 
Prefetching this relation based on the original queryset has the potential to add around 5 seconds to the response time (probably more, that table has doubled in size since I last measured it). 

I feel it would be better to optimise for your usecase, as apposed to try to prevent uncalled-for behaviour. 



On Aug 15, 2017 23:15, "Luke Plant" <[hidden email]> wrote:

I agree with Marc here that the proposed optimizations are 'magical'. I think when it comes to optimizations like these you simply cannot know in advance whether doing extra queries is going to a be an optimization or a pessimization. If I can come up with a single example where it would significantly decrease performance (either memory usage or speed) compared to the default (and I'm sure I can), then I would be strongly opposed to it ever being default behaviour.

Concerning implementing it as an additional  QuerySet method like `auto_prefetch()` - I'm not sure what I think, I feel like it could get icky (i.e. increase our technical debt), due to the way it couples things together. I can't imagine ever wanting to use it, though, I would always prefer the manual option.

Luke



On 15/08/17 21:02, Marc Tamlyn wrote:
Hi Gordon,

Thanks for the suggestion.

I'm not a fan of adding a layer that tries to be this clever. How would possible prefetches be identified? What happens when an initial loop in a view requires one prefetch, but a subsequent loop in a template requires some other prefetch? What about nested loops resulting in nested prefetches? Code like this is almost guaranteed to break unexpectedly in multiple ways. Personally, I would argue that correctly setting up and maintaining appropriate prefetches and selects is a necessary part of working with an ORM.

Do you know of any other ORMs which attempt similar magical optimisations? How do they go about identifying the cases where it is necessary?

On 15 August 2017 at 10:44, Gordon Wrigley <[hidden email]> wrote:
I'd like to discuss automatic prefetching in querysets. Specifically automatically doing prefetch_related where needed without the user having to request it.

For context consider these three snippets using the Question & Choice models from the tutorial when there are 100 questions each with 5 choices for a total of 500 choices.

Default
for choice in Choice.objects.all():
   
print(choice.question.question_text, ':', choice.choice_text)
501 db queries, fetches 500 choice rows and 500 question rows from the DB

Prefetch_related
for choice in Choice.objects.prefetch_related('question'):
   
print(choice.question.question_text, ':', choice.choice_text)
2 db queries, fetches 500 choice rows and 100 question rows from the DB

Select_related
for choice in Choice.objects.select_related('question'):
   
print(choice.question.question_text, ':', choice.choice_text)
1 db query, fetches 500 choice rows and 500 question rows from the DB

I've included select_related for completeness, I'm not going to propose changing anything about it's use. There are places where it is the best choice and in those places it will still be up to the user to request it. I will note that anywhere select_related is optimal prefetch_related is still better than the default and leave it at that.

The 'Default' example above is a classic example of the N+1 query problem, a problem that is widespread in Django apps.
This pattern of queries is what new users produce because they don't know enough about the database and / or ORM to do otherwise.
Experieced users will also often produce this because it's not always obvious what fields will and won't be used and subsequently what should be prefetched.
Additionally that list will change over time. A small change to a template to display an extra field can result in a denial of service on your DB due to a missing prefetch.
Identifying missing prefetches is fiddly, time consuming and error prone. Tools like django-perf-rec (which I was involved in creating) and nplusone exist in part to flag missing prefetches introduced by changed code.
Finally libraries like Django Rest Framework and the Admin will also produce queries like this because it's very difficult for them to know what needs prefetching without being explicitly told by an experienced user.

As hinted at the top I'd like to propose changing Django so the default code behaves like the prefetch_related code.
Longer term I think this should be the default behaviour but obviously it needs to be proved first so for now I'd suggest a new queryset function that enables this behaviour.

I have a proof of concept of this mechanism that I've used successfully in production. I'm not posting it yet because I'd like to focus on desired behavior rather than implementation details. But in summary, what it does is when accessing a missing field on a model, rather than fetching it just for that instance, it runs a prefetch_related query to fetch it for all peer instances that were fetched in the same queryset. So in the example above it prefetches all Questions in one query.

This might seem like a risky thing to do but I'd argue that it really isn't.
The only time this isn't superior to the default case is when you are post filtering the queryset results in Python.
Even in that case it's only inferior if you started with a large number of results, filtered basically all of them and the code is structured so that the filtered ones aren't garbage collected.
To cover this rare case the automatic prefetching can easily be disabled on a per queryset or per object basis. Leaving us with a rare downside that can easily be manually resolved in exchange for a significant general improvement.

In practice this thing is almost magical to work with. Unless you already have extensive and tightly maintained prefetches everywhere you get an immediate boost to virtually everything that touches the database, often knocking orders of magnitude off page load times.

If an agreement can be reached on pursuing this then I'm happy to put in the work to productize the proof of concept.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/d402bf30-a5af-4072-8b50-85e921f7f9af%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CAMwjO1Gaha-K7KkefJkiS3LRdXvaPPwBeuKmhQv6bJFx3dty3w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/a5780df6-ce60-05ae-88e3-997e6bc88f5c%40cantab.net.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CALs0z1bP9deqqvKzfLPHn_YmgxF3vJ5uCh%3DKXQNqaLOuMF7vsg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Automatic prefetching in querysets

Josh Smeaton
I believe we should be optimising for the **common** use case, not expecting everyone to be experts with the ORM.

> If I can come up with a single example where it would significantly decrease performance (either memory usage or speed) compared to the default (and I'm sure I can), then I would be strongly opposed to it ever being default behaviour. 

The status quo is already one where thousands of users and sites are doing the non-optimal thing because we're choosing to be conservative and have users opt-in to the optimal behaviour. A massive complaint against Django is how easy it is for users to build in 1+N behaviour. Django is supposed to abstract the database away (so much so that we avoid SQL related terms in our queryset methods), yet one of the more fundamental concepts such as joins we expect users to know about and optimise for.

I'd be more in favour of throwing an error on non-select-related-foreign-key-access than what we're currently doing which is a query for each access.

The only options users currently have of monitoring poor behaviour is:

1. Add logging to django.db.models
2. Add django-debug-toolbar
3. Investigate page slow downs

Here's a bunch of ways that previously tuned queries can "go bad":

1. A models `__str__` method is updated to include a related field
2. A template uses a previously unused related field
3. A report uses a previously unused related field
4. A ModelAdmin adds a previously unused related field

I think a better question to ask is:

- How many people have had their day/site ruined because we think auto-prefetching is too magical?
- How many people would have their day/site ruined because we think auto-prefetching is the better default?

If we were introducing a new ORM, I think the above answer would be obvious given what we know of Django use today.

What I'd propose:

1. (optional) A global setting to disable autoprefetching
2. An opt out per queryset
3. (optional) An opt out per Meta?
4. Logging any autoprefetches - perhaps as a warning?

More experienced Django users that do not want this behaviour are going to know about a global setting and can opt in to the old behaviour rather easily. Newer users that do not know about select/prefetch_related or these settings will fall into the new behaviour by default.

It's unreasonable to expect every user of django learn the ins and outs of all queryset methods. I'm probably considered a django orm expert, and I still sometimes write queries that are non-optimal or *become* non-optimal after changes in unrelated areas. At an absolute minimum we should be screaming and shouting when this happens. But we can also fix the issue while complaining, and help guide users into correct behaviour.


On Wednesday, 16 August 2017 08:41:31 UTC+10, Anthony King wrote:
Automatically prefetching is something I feel should be avoided. 

A common gripe I have with ORMs is they hide what's actually happening with the database, resulting in beginners-going-on-intermediates building libraries/systems that don't scale well. 

We have several views in a dashboard, where a relation may be accessed once or twice while iterating over a large python filtered queryset. 
Prefetching this relation based on the original queryset has the potential to add around 5 seconds to the response time (probably more, that table has doubled in size since I last measured it). 

I feel it would be better to optimise for your usecase, as apposed to try to prevent uncalled-for behaviour. 



On Aug 15, 2017 23:15, "Luke Plant" <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="rSKsqkhpCAAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">L.Pla...@...> wrote:

I agree with Marc here that the proposed optimizations are 'magical'. I think when it comes to optimizations like these you simply cannot know in advance whether doing extra queries is going to a be an optimization or a pessimization. If I can come up with a single example where it would significantly decrease performance (either memory usage or speed) compared to the default (and I'm sure I can), then I would be strongly opposed to it ever being default behaviour.

Concerning implementing it as an additional  QuerySet method like `auto_prefetch()` - I'm not sure what I think, I feel like it could get icky (i.e. increase our technical debt), due to the way it couples things together. I can't imagine ever wanting to use it, though, I would always prefer the manual option.

Luke



On 15/08/17 21:02, Marc Tamlyn wrote:
Hi Gordon,

Thanks for the suggestion.

I'm not a fan of adding a layer that tries to be this clever. How would possible prefetches be identified? What happens when an initial loop in a view requires one prefetch, but a subsequent loop in a template requires some other prefetch? What about nested loops resulting in nested prefetches? Code like this is almost guaranteed to break unexpectedly in multiple ways. Personally, I would argue that correctly setting up and maintaining appropriate prefetches and selects is a necessary part of working with an ORM.

Do you know of any other ORMs which attempt similar magical optimisations? How do they go about identifying the cases where it is necessary?

On 15 August 2017 at 10:44, Gordon Wrigley <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="rSKsqkhpCAAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">gordon....@...> wrote:
I'd like to discuss automatic prefetching in querysets. Specifically automatically doing prefetch_related where needed without the user having to request it.

For context consider these three snippets using the Question & Choice models from the <a href="https://docs.djangoproject.com/en/1.11/intro/tutorial02/#creating-models" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fdocs.djangoproject.com%2Fen%2F1.11%2Fintro%2Ftutorial02%2F%23creating-models\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHF-kaWCchlNx159s33OFlJX4mp9A&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fdocs.djangoproject.com%2Fen%2F1.11%2Fintro%2Ftutorial02%2F%23creating-models\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHF-kaWCchlNx159s33OFlJX4mp9A&#39;;return true;">tutorial when there are 100 questions each with 5 choices for a total of 500 choices.

Default
for choice in Choice.objects.all():
   
print(choice.question.question_text, ':', choice.choice_text)
501 db queries, fetches 500 choice rows and 500 question rows from the DB

Prefetch_related
for choice in Choice.objects.prefetch_related('question'):
   
print(choice.question.question_text, ':', choice.choice_text)
2 db queries, fetches 500 choice rows and 100 question rows from the DB

Select_related
for choice in Choice.objects.select_related('question'):
   
print(choice.question.question_text, ':', choice.choice_text)
1 db query, fetches 500 choice rows and 500 question rows from the DB

I've included select_related for completeness, I'm not going to propose changing anything about it's use. There are places where it is the best choice and in those places it will still be up to the user to request it. I will note that anywhere select_related is optimal prefetch_related is still better than the default and leave it at that.

The 'Default' example above is a classic example of the N+1 query problem, a problem that is widespread in Django apps.
This pattern of queries is what new users produce because they don't know enough about the database and / or ORM to do otherwise.
Experieced users will also often produce this because it's not always obvious what fields will and won't be used and subsequently what should be prefetched.
Additionally that list will change over time. A small change to a template to display an extra field can result in a denial of service on your DB due to a missing prefetch.
Identifying missing prefetches is fiddly, time consuming and error prone. Tools like <a href="https://github.com/YPlan/django-perf-rec" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fgithub.com%2FYPlan%2Fdjango-perf-rec\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEgakhDw2GUDP6X96h6Fc0TQot45A&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fgithub.com%2FYPlan%2Fdjango-perf-rec\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEgakhDw2GUDP6X96h6Fc0TQot45A&#39;;return true;">django-perf-rec (which I was involved in creating) and <a href="https://github.com/jmcarp/nplusone" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fgithub.com%2Fjmcarp%2Fnplusone\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNH88BgHd_7YWTnFw0EyrpYYjRvwuQ&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fgithub.com%2Fjmcarp%2Fnplusone\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNH88BgHd_7YWTnFw0EyrpYYjRvwuQ&#39;;return true;">nplusone exist in part to flag missing prefetches introduced by changed code.
Finally libraries like Django Rest Framework and the Admin will also produce queries like this because it's very difficult for them to know what needs prefetching without being explicitly told by an experienced user.

As hinted at the top I'd like to propose changing Django so the default code behaves like the prefetch_related code.
Longer term I think this should be the default behaviour but obviously it needs to be proved first so for now I'd suggest a new queryset function that enables this behaviour.

I have a proof of concept of this mechanism that I've used successfully in production. I'm not posting it yet because I'd like to focus on desired behavior rather than implementation details. But in summary, what it does is when accessing a missing field on a model, rather than fetching it just for that instance, it runs a prefetch_related query to fetch it for all peer instances that were fetched in the same queryset. So in the example above it prefetches all Questions in one query.

This might seem like a risky thing to do but I'd argue that it really isn't.
The only time this isn't superior to the default case is when you are post filtering the queryset results in Python.
Even in that case it's only inferior if you started with a large number of results, filtered basically all of them and the code is structured so that the filtered ones aren't garbage collected.
To cover this rare case the automatic prefetching can easily be disabled on a per queryset or per object basis. Leaving us with a rare downside that can easily be manually resolved in exchange for a significant general improvement.

In practice this thing is almost magical to work with. Unless you already have extensive and tightly maintained prefetches everywhere you get an immediate boost to virtually everything that touches the database, often knocking orders of magnitude off page load times.

If an agreement can be reached on pursuing this then I'm happy to put in the work to productize the proof of concept.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="rSKsqkhpCAAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">django-develop...@googlegroups.com.
To post to this group, send email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="rSKsqkhpCAAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">django-d...@googlegroups.com.
Visit this group at <a href="https://groups.google.com/group/django-developers" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://groups.google.com/group/django-developers&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/group/django-developers&#39;;return true;">https://groups.google.com/group/django-developers.
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/django-developers/d402bf30-a5af-4072-8b50-85e921f7f9af%40googlegroups.com?utm_medium=email&amp;utm_source=footer" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://groups.google.com/d/msgid/django-developers/d402bf30-a5af-4072-8b50-85e921f7f9af%40googlegroups.com?utm_medium\x3demail\x26utm_source\x3dfooter&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/msgid/django-developers/d402bf30-a5af-4072-8b50-85e921f7f9af%40googlegroups.com?utm_medium\x3demail\x26utm_source\x3dfooter&#39;;return true;">https://groups.google.com/d/msgid/django-developers/d402bf30-a5af-4072-8b50-85e921f7f9af%40googlegroups.com.
For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;">https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="rSKsqkhpCAAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">django-develop...@googlegroups.com.
To post to this group, send email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="rSKsqkhpCAAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">django-d...@googlegroups.com.
Visit this group at <a href="https://groups.google.com/group/django-developers" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://groups.google.com/group/django-developers&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/group/django-developers&#39;;return true;">https://groups.google.com/group/django-developers.
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/django-developers/CAMwjO1Gaha-K7KkefJkiS3LRdXvaPPwBeuKmhQv6bJFx3dty3w%40mail.gmail.com?utm_medium=email&amp;utm_source=footer" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://groups.google.com/d/msgid/django-developers/CAMwjO1Gaha-K7KkefJkiS3LRdXvaPPwBeuKmhQv6bJFx3dty3w%40mail.gmail.com?utm_medium\x3demail\x26utm_source\x3dfooter&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/msgid/django-developers/CAMwjO1Gaha-K7KkefJkiS3LRdXvaPPwBeuKmhQv6bJFx3dty3w%40mail.gmail.com?utm_medium\x3demail\x26utm_source\x3dfooter&#39;;return true;">https://groups.google.com/d/msgid/django-developers/CAMwjO1Gaha-K7KkefJkiS3LRdXvaPPwBeuKmhQv6bJFx3dty3w%40mail.gmail.com.
For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;">https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="rSKsqkhpCAAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">django-develop...@googlegroups.com.
To post to this group, send email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="rSKsqkhpCAAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">django-d...@googlegroups.com.
Visit this group at <a href="https://groups.google.com/group/django-developers" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://groups.google.com/group/django-developers&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/group/django-developers&#39;;return true;">https://groups.google.com/group/django-developers.
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/django-developers/a5780df6-ce60-05ae-88e3-997e6bc88f5c%40cantab.net?utm_medium=email&amp;utm_source=footer" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://groups.google.com/d/msgid/django-developers/a5780df6-ce60-05ae-88e3-997e6bc88f5c%40cantab.net?utm_medium\x3demail\x26utm_source\x3dfooter&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/msgid/django-developers/a5780df6-ce60-05ae-88e3-997e6bc88f5c%40cantab.net?utm_medium\x3demail\x26utm_source\x3dfooter&#39;;return true;">https://groups.google.com/d/msgid/django-developers/a5780df6-ce60-05ae-88e3-997e6bc88f5c%40cantab.net.
For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;">https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/2f0b5932-1a38-4eaf-84aa-13960a303141%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Automatic prefetching in querysets

Curtis Maloney-2
The 2 goals of a famework:
- protect you from the tedious things
- protect you from the dangerous things

N+1 queries would be in the 'dangerous' category, IMHO, and 'detecting
causes of N+1 queries' is in the 'tedious'.

If we can at least add some DEBUG flagged machinery to detect and warn
of potential prefetch/select candidates, it would be a big win.

--
C

On 16/08/17 09:26, Josh Smeaton wrote:

> I believe we should be optimising for the **common** use case, not
> expecting everyone to be experts with the ORM.
>
>  > If I can come up with a single example where it would significantly
> decrease performance (either memory usage or speed) compared to the
> default (and I'm sure I can), then I would be strongly opposed to it
> ever being default behaviour.
>
> The status quo is already one where thousands of users and sites are
> doing the non-optimal thing because we're choosing to be conservative
> and have users opt-in to the optimal behaviour. A massive complaint
> against Django is how easy it is for users to build in 1+N behaviour.
> Django is supposed to abstract the database away (so much so that we
> avoid SQL related terms in our queryset methods), yet one of the more
> fundamental concepts such as joins we expect users to know about and
> optimise for.
>
> I'd be more in favour of throwing an error on
> non-select-related-foreign-key-access than what we're currently doing
> which is a query for each access.
>
> The only options users currently have of monitoring poor behaviour is:
>
> 1. Add logging to django.db.models
> 2. Add django-debug-toolbar
> 3. Investigate page slow downs
>
> Here's a bunch of ways that previously tuned queries can "go bad":
>
> 1. A models `__str__` method is updated to include a related field
> 2. A template uses a previously unused related field
> 3. A report uses a previously unused related field
> 4. A ModelAdmin adds a previously unused related field
>
> I think a better question to ask is:
>
> - How many people have had their day/site ruined because we think
> auto-prefetching is too magical?
> - How many people would have their day/site ruined because we think
> auto-prefetching is the better default?
>
> If we were introducing a new ORM, I think the above answer would be
> obvious given what we know of Django use today.
>
> What I'd propose:
>
> 1. (optional) A global setting to disable autoprefetching
> 2. An opt out per queryset
> 3. (optional) An opt out per Meta?
> 4. Logging any autoprefetches - perhaps as a warning?
>
> More experienced Django users that do not want this behaviour are going
> to know about a global setting and can opt in to the old behaviour
> rather easily. Newer users that do not know about
> select/prefetch_related or these settings will fall into the new
> behaviour by default.
>
> It's unreasonable to expect every user of django learn the ins and outs
> of all queryset methods. I'm probably considered a django orm expert,
> and I still sometimes write queries that are non-optimal or *become*
> non-optimal after changes in unrelated areas. At an absolute minimum we
> should be screaming and shouting when this happens. But we can also fix
> the issue while complaining, and help guide users into correct behaviour.
>
>
> On Wednesday, 16 August 2017 08:41:31 UTC+10, Anthony King wrote:
>
>     Automatically prefetching is something I feel should be avoided.
>
>     A common gripe I have with ORMs is they hide what's actually
>     happening with the database, resulting in
>     beginners-going-on-intermediates building libraries/systems that
>     don't scale well.
>
>     We have several views in a dashboard, where a relation may be
>     accessed once or twice while iterating over a large python filtered
>     queryset.
>     Prefetching this relation based on the original queryset has the
>     potential to add around 5 seconds to the response time (probably
>     more, that table has doubled in size since I last measured it).
>
>     I feel it would be better to optimise for your usecase, as apposed
>     to try to prevent uncalled-for behaviour.
>
>
>
>     On Aug 15, 2017 23:15, "Luke Plant" <[hidden email]
>     <javascript:>> wrote:
>
>         I agree with Marc here that the proposed optimizations are
>         'magical'. I think when it comes to optimizations like these you
>         simply cannot know in advance whether doing extra queries is
>         going to a be an optimization or a pessimization. If I can come
>         up with a single example where it would significantly decrease
>         performance (either memory usage or speed) compared to the
>         default (and I'm sure I can), then I would be strongly opposed
>         to it ever being default behaviour.
>
>         Concerning implementing it as an additional  QuerySet method
>         like `auto_prefetch()` - I'm not sure what I think, I feel like
>         it could get icky (i.e. increase our technical debt), due to the
>         way it couples things together. I can't imagine ever wanting to
>         use it, though, I would always prefer the manual option.
>
>         Luke
>
>
>
>         On 15/08/17 21:02, Marc Tamlyn wrote:
>>         Hi Gordon,
>>
>>         Thanks for the suggestion.
>>
>>         I'm not a fan of adding a layer that tries to be this clever.
>>         How would possible prefetches be identified? What happens when
>>         an initial loop in a view requires one prefetch, but a
>>         subsequent loop in a template requires some other prefetch?
>>         What about nested loops resulting in nested prefetches? Code
>>         like this is almost guaranteed to break unexpectedly in
>>         multiple ways. Personally, I would argue that correctly
>>         setting up and maintaining appropriate prefetches and selects
>>         is a necessary part of working with an ORM.
>>
>>         Do you know of any other ORMs which attempt similar magical
>>         optimisations? How do they go about identifying the cases
>>         where it is necessary?
>>
>>         On 15 August 2017 at 10:44, Gordon Wrigley
>>         <[hidden email] <javascript:>> wrote:
>>
>>             I'd like to discuss automatic prefetching in querysets.
>>             Specifically automatically doing prefetch_related where
>>             needed without the user having to request it.
>>
>>             For context consider these three snippets using the
>>             Question & Choice models from the tutorial
>>             <https://docs.djangoproject.com/en/1.11/intro/tutorial02/#creating-models> when
>>             there are 100 questions each with 5 choices for a total of
>>             500 choices.
>>
>>             Default
>>             |
>>             forchoice inChoice.objects.all():
>>             print(choice.question.question_text,':',choice.choice_text)
>>             |
>>             501 db queries, fetches 500 choice rows and 500 question
>>             rows from the DB
>>
>>             Prefetch_related
>>             |
>>             forchoice inChoice.objects.prefetch_related('question'):
>>             print(choice.question.question_text,':',choice.choice_text)
>>             |
>>             2 db queries, fetches 500 choice rows and 100 question
>>             rows from the DB
>>
>>             Select_related
>>             |
>>             forchoice inChoice.objects.select_related('question'):
>>             print(choice.question.question_text,':',choice.choice_text)
>>             |
>>             1 db query, fetches 500 choice rows and 500 question rows
>>             from the DB
>>
>>             I've included select_related for completeness, I'm not
>>             going to propose changing anything about it's use. There
>>             are places where it is the best choice and in those places
>>             it will still be up to the user to request it. I will note
>>             that anywhere select_related is optimal prefetch_related
>>             is still better than the default and leave it at that.
>>
>>             The 'Default' example above is a classic example of the
>>             N+1 query problem, a problem that is widespread in Django
>>             apps.
>>             This pattern of queries is what new users produce because
>>             they don't know enough about the database and / or ORM to
>>             do otherwise.
>>             Experieced users will also often produce this because it's
>>             not always obvious what fields will and won't be used and
>>             subsequently what should be prefetched.
>>             Additionally that list will change over time. A small
>>             change to a template to display an extra field can result
>>             in a denial of service on your DB due to a missing prefetch.
>>             Identifying missing prefetches is fiddly, time consuming
>>             and error prone. Tools like django-perf-rec
>>             <https://github.com/YPlan/django-perf-rec> (which I was
>>             involved in creating) and nplusone
>>             <https://github.com/jmcarp/nplusone> exist in part to flag
>>             missing prefetches introduced by changed code.
>>             Finally libraries like Django Rest Framework and the Admin
>>             will also produce queries like this because it's very
>>             difficult for them to know what needs prefetching without
>>             being explicitly told by an experienced user.
>>
>>             As hinted at the top I'd like to propose changing Django
>>             so the default code behaves like the prefetch_related code.
>>             Longer term I think this should be the default behaviour
>>             but obviously it needs to be proved first so for now I'd
>>             suggest a new queryset function that enables this behaviour.
>>
>>             I have a proof of concept of this mechanism that I've used
>>             successfully in production. I'm not posting it yet because
>>             I'd like to focus on desired behavior rather than
>>             implementation details. But in summary, what it does is
>>             when accessing a missing field on a model, rather than
>>             fetching it just for that instance, it runs a
>>             prefetch_related query to fetch it for all peer instances
>>             that were fetched in the same queryset. So in the example
>>             above it prefetches all Questions in one query.
>>
>>             This might seem like a risky thing to do but I'd argue
>>             that it really isn't.
>>             The only time this isn't superior to the default case is
>>             when you are post filtering the queryset results in Python.
>>             Even in that case it's only inferior if you started with a
>>             large number of results, filtered basically all of them
>>             and the code is structured so that the filtered ones
>>             aren't garbage collected.
>>             To cover this rare case the automatic prefetching can
>>             easily be disabled on a per queryset or per object basis.
>>             Leaving us with a rare downside that can easily be
>>             manually resolved in exchange for a significant general
>>             improvement.
>>
>>             In practice this thing is almost magical to work with.
>>             Unless you already have extensive and tightly maintained
>>             prefetches everywhere you get an immediate boost to
>>             virtually everything that touches the database, often
>>             knocking orders of magnitude off page load times.
>>
>>             If an agreement can be reached on pursuing this then I'm
>>             happy to put in the work to productize the proof of concept.
>>
>>             --
>>             You received this message because you are subscribed to
>>             the Google Groups "Django developers (Contributions to
>>             Django itself)" group.
>>             To unsubscribe from this group and stop receiving emails
>>             from it, send an email to
>>             [hidden email] <javascript:>.
>>             To post to this group, send email to
>>             [hidden email] <javascript:>.
>>             Visit this group at
>>             https://groups.google.com/group/django-developers
>>             <https://groups.google.com/group/django-developers>.
>>             To view this discussion on the web visit
>>             https://groups.google.com/d/msgid/django-developers/d402bf30-a5af-4072-8b50-85e921f7f9af%40googlegroups.com
>>             <https://groups.google.com/d/msgid/django-developers/d402bf30-a5af-4072-8b50-85e921f7f9af%40googlegroups.com?utm_medium=email&utm_source=footer>.
>>             For more options, visit https://groups.google.com/d/optout
>>             <https://groups.google.com/d/optout>.
>>
>>
>>         --
>>         You received this message because you are subscribed to the
>>         Google Groups "Django developers (Contributions to Django
>>         itself)" group.
>>         To unsubscribe from this group and stop receiving emails from
>>         it, send an email to [hidden email]
>>         <javascript:>.
>>         To post to this group, send email to
>>         [hidden email] <javascript:>.
>>         Visit this group at
>>         https://groups.google.com/group/django-developers
>>         <https://groups.google.com/group/django-developers>.
>>         To view this discussion on the web visit
>>         https://groups.google.com/d/msgid/django-developers/CAMwjO1Gaha-K7KkefJkiS3LRdXvaPPwBeuKmhQv6bJFx3dty3w%40mail.gmail.com
>>         <https://groups.google.com/d/msgid/django-developers/CAMwjO1Gaha-K7KkefJkiS3LRdXvaPPwBeuKmhQv6bJFx3dty3w%40mail.gmail.com?utm_medium=email&utm_source=footer>.
>>         For more options, visit https://groups.google.com/d/optout
>>         <https://groups.google.com/d/optout>.
>
>         --
>         You received this message because you are subscribed to the
>         Google Groups "Django developers (Contributions to Django
>         itself)" group.
>         To unsubscribe from this group and stop receiving emails from
>         it, send an email to [hidden email]
>         <javascript:>.
>         To post to this group, send email to
>         [hidden email] <javascript:>.
>         Visit this group at
>         https://groups.google.com/group/django-developers
>         <https://groups.google.com/group/django-developers>.
>         To view this discussion on the web visit
>         https://groups.google.com/d/msgid/django-developers/a5780df6-ce60-05ae-88e3-997e6bc88f5c%40cantab.net
>         <https://groups.google.com/d/msgid/django-developers/a5780df6-ce60-05ae-88e3-997e6bc88f5c%40cantab.net?utm_medium=email&utm_source=footer>.
>         For more options, visit https://groups.google.com/d/optout
>         <https://groups.google.com/d/optout>.
>
> --
> You received this message because you are subscribed to the Google
> Groups "Django developers (Contributions to Django itself)" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to [hidden email]
> <mailto:[hidden email]>.
> To post to this group, send email to [hidden email]
> <mailto:[hidden email]>.
> Visit this group at https://groups.google.com/group/django-developers.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-developers/2f0b5932-1a38-4eaf-84aa-13960a303141%40googlegroups.com 
> <https://groups.google.com/d/msgid/django-developers/2f0b5932-1a38-4eaf-84aa-13960a303141%40googlegroups.com?utm_medium=email&utm_source=footer>.
> For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/318ce102-d59f-0fee-a697-43c97f05e236%40tinbrain.net.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Automatic prefetching in querysets

Cristiano Coelho
In reply to this post by tolomea
I would rather have warnings as well, adding more magical behavior is bad and might even degrade performance on some cases, automatically selecting a bunch of data that "might" be used is bad, and specially considering how slow python is, accidentally loading/building 1k+ objects when maybe only one of them is used would be as bad as doing 1k+ queries.

If the systems you are building are that large and complicated you can't have people with 0 SQL knowledge doing stuff neither! So many things to tweak, indexes, data denormalization, proper joins here and there, unique constraints, locks and race conditions, someone attempting to code something that's not a blog or hello world really needs to know a bit about all of that.


El martes, 15 de agosto de 2017, 6:44:19 (UTC-3), Gordon Wrigley escribió:
I'd like to discuss automatic prefetching in querysets. Specifically automatically doing prefetch_related where needed without the user having to request it.

For context consider these three snippets using the Question & Choice models from the <a href="https://docs.djangoproject.com/en/1.11/intro/tutorial02/#creating-models" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fdocs.djangoproject.com%2Fen%2F1.11%2Fintro%2Ftutorial02%2F%23creating-models\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHF-kaWCchlNx159s33OFlJX4mp9A&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fdocs.djangoproject.com%2Fen%2F1.11%2Fintro%2Ftutorial02%2F%23creating-models\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHF-kaWCchlNx159s33OFlJX4mp9A&#39;;return true;">tutorial when there are 100 questions each with 5 choices for a total of 500 choices.

Default
for choice in Choice.objects.all():
   
print(choice.question.question_text, ':', choice.choice_text)
501 db queries, fetches 500 choice rows and 500 question rows from the DB

Prefetch_related
for choice in Choice.objects.prefetch_related('question'):
   
print(choice.question.question_text, ':', choice.choice_text)
2 db queries, fetches 500 choice rows and 100 question rows from the DB

Select_related
for choice in Choice.objects.select_related('question'):
   
print(choice.question.question_text, ':', choice.choice_text)
1 db query, fetches 500 choice rows and 500 question rows from the DB

I've included select_related for completeness, I'm not going to propose changing anything about it's use. There are places where it is the best choice and in those places it will still be up to the user to request it. I will note that anywhere select_related is optimal prefetch_related is still better than the default and leave it at that.

The 'Default' example above is a classic example of the N+1 query problem, a problem that is widespread in Django apps.
This pattern of queries is what new users produce because they don't know enough about the database and / or ORM to do otherwise.
Experieced users will also often produce this because it's not always obvious what fields will and won't be used and subsequently what should be prefetched.
Additionally that list will change over time. A small change to a template to display an extra field can result in a denial of service on your DB due to a missing prefetch.
Identifying missing prefetches is fiddly, time consuming and error prone. Tools like <a href="https://github.com/YPlan/django-perf-rec" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fgithub.com%2FYPlan%2Fdjango-perf-rec\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEgakhDw2GUDP6X96h6Fc0TQot45A&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fgithub.com%2FYPlan%2Fdjango-perf-rec\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEgakhDw2GUDP6X96h6Fc0TQot45A&#39;;return true;">django-perf-rec (which I was involved in creating) and <a href="https://github.com/jmcarp/nplusone" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fgithub.com%2Fjmcarp%2Fnplusone\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNH88BgHd_7YWTnFw0EyrpYYjRvwuQ&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fgithub.com%2Fjmcarp%2Fnplusone\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNH88BgHd_7YWTnFw0EyrpYYjRvwuQ&#39;;return true;">nplusone exist in part to flag missing prefetches introduced by changed code.
Finally libraries like Django Rest Framework and the Admin will also produce queries like this because it's very difficult for them to know what needs prefetching without being explicitly told by an experienced user.

As hinted at the top I'd like to propose changing Django so the default code behaves like the prefetch_related code.
Longer term I think this should be the default behaviour but obviously it needs to be proved first so for now I'd suggest a new queryset function that enables this behaviour.

I have a proof of concept of this mechanism that I've used successfully in production. I'm not posting it yet because I'd like to focus on desired behavior rather than implementation details. But in summary, what it does is when accessing a missing field on a model, rather than fetching it just for that instance, it runs a prefetch_related query to fetch it for all peer instances that were fetched in the same queryset. So in the example above it prefetches all Questions in one query.

This might seem like a risky thing to do but I'd argue that it really isn't.
The only time this isn't superior to the default case is when you are post filtering the queryset results in Python.
Even in that case it's only inferior if you started with a large number of results, filtered basically all of them and the code is structured so that the filtered ones aren't garbage collected.
To cover this rare case the automatic prefetching can easily be disabled on a per queryset or per object basis. Leaving us with a rare downside that can easily be manually resolved in exchange for a significant general improvement.

In practice this thing is almost magical to work with. Unless you already have extensive and tightly maintained prefetches everywhere you get an immediate boost to virtually everything that touches the database, often knocking orders of magnitude off page load times.

If an agreement can be reached on pursuing this then I'm happy to put in the work to productize the proof of concept.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/8d272787-7b8a-430b-a7d4-453981f894d6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Automatic prefetching in querysets

Alexander Hill
I think this is an excellent suggestion.

It seems generally accepted in this thread that although there are cases where this would hurt performance, it would on average solve more problems than it creates. The debate seems to be more whether or not it's "right" for the ORM to behave in this magical way. With that in mind...

It won't affect experienced users. They'll read the release notes, see that this change has been implemented, and either go and delete a bunch of prefetch_related() calls, grumble a bit and turn auto-prefetch off globally or just file it away as another fact they know about the Django ORM.

For beginners, this can defer the pain of learning about N+1 problems, potentially forever depending on the scale of the project. Ultimately Django's job isn't to teach hard lessons about SQL gotchas, it's to make it easy to make a nice website. This proposal would reduce both average load time of Django pages and debugging time, at the cost of the average Django developer being a little more ignorant about what queries the ORM generates. I think that's a good trade and in line with the goals of the project.

Django's ORM isn't SQLAlchemy - it's built on high-level concepts, designed to be relatively beginner-friendly, and already pretty magical. select_related() and prefetch_related() are somewhat awkward plugs for a leaky abstraction, and IMO this is just a better plug.

Alex


On Wed, 16 Aug 2017 at 12:12 Cristiano Coelho <[hidden email]> wrote:
I would rather have warnings as well, adding more magical behavior is bad and might even degrade performance on some cases, automatically selecting a bunch of data that "might" be used is bad, and specially considering how slow python is, accidentally loading/building 1k+ objects when maybe only one of them is used would be as bad as doing 1k+ queries.

If the systems you are building are that large and complicated you can't have people with 0 SQL knowledge doing stuff neither! So many things to tweak, indexes, data denormalization, proper joins here and there, unique constraints, locks and race conditions, someone attempting to code something that's not a blog or hello world really needs to know a bit about all of that.



El martes, 15 de agosto de 2017, 6:44:19 (UTC-3), Gordon Wrigley escribió:
I'd like to discuss automatic prefetching in querysets. Specifically automatically doing prefetch_related where needed without the user having to request it.

For context consider these three snippets using the Question & Choice models from the tutorial when there are 100 questions each with 5 choices for a total of 500 choices.

Default
for choice in Choice.objects.all():
   
print(choice.question.question_text, ':', choice.choice_text)
501 db queries, fetches 500 choice rows and 500 question rows from the DB

Prefetch_related
for choice in Choice.objects.prefetch_related('question'):
   
print(choice.question.question_text, ':', choice.choice_text)
2 db queries, fetches 500 choice rows and 100 question rows from the DB

Select_related
for choice in Choice.objects.select_related('question'):
   
print(choice.question.question_text, ':', choice.choice_text)
1 db query, fetches 500 choice rows and 500 question rows from the DB

I've included select_related for completeness, I'm not going to propose changing anything about it's use. There are places where it is the best choice and in those places it will still be up to the user to request it. I will note that anywhere select_related is optimal prefetch_related is still better than the default and leave it at that.

The 'Default' example above is a classic example of the N+1 query problem, a problem that is widespread in Django apps.
This pattern of queries is what new users produce because they don't know enough about the database and / or ORM to do otherwise.
Experieced users will also often produce this because it's not always obvious what fields will and won't be used and subsequently what should be prefetched.
Additionally that list will change over time. A small change to a template to display an extra field can result in a denial of service on your DB due to a missing prefetch.
Identifying missing prefetches is fiddly, time consuming and error prone. Tools like django-perf-rec (which I was involved in creating) and nplusone exist in part to flag missing prefetches introduced by changed code.
Finally libraries like Django Rest Framework and the Admin will also produce queries like this because it's very difficult for them to know what needs prefetching without being explicitly told by an experienced user.

As hinted at the top I'd like to propose changing Django so the default code behaves like the prefetch_related code.
Longer term I think this should be the default behaviour but obviously it needs to be proved first so for now I'd suggest a new queryset function that enables this behaviour.

I have a proof of concept of this mechanism that I've used successfully in production. I'm not posting it yet because I'd like to focus on desired behavior rather than implementation details. But in summary, what it does is when accessing a missing field on a model, rather than fetching it just for that instance, it runs a prefetch_related query to fetch it for all peer instances that were fetched in the same queryset. So in the example above it prefetches all Questions in one query.

This might seem like a risky thing to do but I'd argue that it really isn't.
The only time this isn't superior to the default case is when you are post filtering the queryset results in Python.
Even in that case it's only inferior if you started with a large number of results, filtered basically all of them and the code is structured so that the filtered ones aren't garbage collected.
To cover this rare case the automatic prefetching can easily be disabled on a per queryset or per object basis. Leaving us with a rare downside that can easily be manually resolved in exchange for a significant general improvement.

In practice this thing is almost magical to work with. Unless you already have extensive and tightly maintained prefetches everywhere you get an immediate boost to virtually everything that touches the database, often knocking orders of magnitude off page load times.

If an agreement can be reached on pursuing this then I'm happy to put in the work to productize the proof of concept.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/8d272787-7b8a-430b-a7d4-453981f894d6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CA%2BKBOKyBGLsBRU9eNr9YAZgu85a_5mnvgP1T%3DDFG-vSyviEk2w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Automatic prefetching in querysets

Brice PARENT-2
In reply to this post by Josh Smeaton

I almost agree with you... But I would make the behaviour an opt-in instead of an opt-out. I wouldn't like such a behaviour to be inserted with a simple version update.

What I wouldn't want is to give free optimisations to non-ORM pros at the cost of creating major unexpected or unneeded loads to already well optimized code bases (for which there would be close to no gain as the prefetch already exist where needed, and the only prefetch that would be added would be unnecessary).

Making it opt-in allows:

- for power ORM users not to use it at all without doing anything. We have to keep in mind that hose SQL experts may not be the ones that do the upgrades, so lead to unexpected efficiency loss without anyone to know where it comes from. As this functionality doesn't exist yet, probably no one has ever written any unit tests to guarantee that there is no unnecessary prefetch, which is the only way a non-expert would notice the upgrade is inserting a problem.

- for developers looking for better performances to set it on, but only in debug mode and local development to understand where they need to prefetch related fields. And when their code base is optimal, unset this option and remove the magic and any unwanted side effect. This would of course need to be documented in the pages about how to optimize and get better performances with Django.

- for new users or when creating applications mockups and proofs of concepts, they could set it on (the tutorial would talk a bit about it) and just use it, allowing them not to care about those optimizations that they don't need yet to take care of.

- Brice



Le 16/08/17 à 01:26, Josh Smeaton a écrit :
I believe we should be optimising for the **common** use case, not expecting everyone to be experts with the ORM.

> If I can come up with a single example where it would significantly decrease performance (either memory usage or speed) compared to the default (and I'm sure I can), then I would be strongly opposed to it ever being default behaviour. 

The status quo is already one where thousands of users and sites are doing the non-optimal thing because we're choosing to be conservative and have users opt-in to the optimal behaviour. A massive complaint against Django is how easy it is for users to build in 1+N behaviour. Django is supposed to abstract the database away (so much so that we avoid SQL related terms in our queryset methods), yet one of the more fundamental concepts such as joins we expect users to know about and optimise for.

I'd be more in favour of throwing an error on non-select-related-foreign-key-access than what we're currently doing which is a query for each access.

The only options users currently have of monitoring poor behaviour is:

1. Add logging to django.db.models
2. Add django-debug-toolbar
3. Investigate page slow downs

Here's a bunch of ways that previously tuned queries can "go bad":

1. A models `__str__` method is updated to include a related field
2. A template uses a previously unused related field
3. A report uses a previously unused related field
4. A ModelAdmin adds a previously unused related field

I think a better question to ask is:

- How many people have had their day/site ruined because we think auto-prefetching is too magical?
- How many people would have their day/site ruined because we think auto-prefetching is the better default?

If we were introducing a new ORM, I think the above answer would be obvious given what we know of Django use today.

What I'd propose:

1. (optional) A global setting to disable autoprefetching
2. An opt out per queryset
3. (optional) An opt out per Meta?
4. Logging any autoprefetches - perhaps as a warning?

More experienced Django users that do not want this behaviour are going to know about a global setting and can opt in to the old behaviour rather easily. Newer users that do not know about select/prefetch_related or these settings will fall into the new behaviour by default.

It's unreasonable to expect every user of django learn the ins and outs of all queryset methods. I'm probably considered a django orm expert, and I still sometimes write queries that are non-optimal or *become* non-optimal after changes in unrelated areas. At an absolute minimum we should be screaming and shouting when this happens. But we can also fix the issue while complaining, and help guide users into correct behaviour.


On Wednesday, 16 August 2017 08:41:31 UTC+10, Anthony King wrote:
Automatically prefetching is something I feel should be avoided. 

A common gripe I have with ORMs is they hide what's actually happening with the database, resulting in beginners-going-on-intermediates building libraries/systems that don't scale well. 

We have several views in a dashboard, where a relation may be accessed once or twice while iterating over a large python filtered queryset. 
Prefetching this relation based on the original queryset has the potential to add around 5 seconds to the response time (probably more, that table has doubled in size since I last measured it). 

I feel it would be better to optimise for your usecase, as apposed to try to prevent uncalled-for behaviour. 



On Aug 15, 2017 23:15, "Luke Plant" <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="rSKsqkhpCAAJ" rel="nofollow" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;" moz-do-not-send="true">L.Pla...@...> wrote:

I agree with Marc here that the proposed optimizations are 'magical'. I think when it comes to optimizations like these you simply cannot know in advance whether doing extra queries is going to a be an optimization or a pessimization. If I can come up with a single example where it would significantly decrease performance (either memory usage or speed) compared to the default (and I'm sure I can), then I would be strongly opposed to it ever being default behaviour.

Concerning implementing it as an additional  QuerySet method like `auto_prefetch()` - I'm not sure what I think, I feel like it could get icky (i.e. increase our technical debt), due to the way it couples things together. I can't imagine ever wanting to use it, though, I would always prefer the manual option.

Luke



On 15/08/17 21:02, Marc Tamlyn wrote:
Hi Gordon,

Thanks for the suggestion.

I'm not a fan of adding a layer that tries to be this clever. How would possible prefetches be identified? What happens when an initial loop in a view requires one prefetch, but a subsequent loop in a template requires some other prefetch? What about nested loops resulting in nested prefetches? Code like this is almost guaranteed to break unexpectedly in multiple ways. Personally, I would argue that correctly setting up and maintaining appropriate prefetches and selects is a necessary part of working with an ORM.

Do you know of any other ORMs which attempt similar magical optimisations? How do they go about identifying the cases where it is necessary?

On 15 August 2017 at 10:44, Gordon Wrigley <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="rSKsqkhpCAAJ" rel="nofollow" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;" moz-do-not-send="true">gordon....@...> wrote:
I'd like to discuss automatic prefetching in querysets. Specifically automatically doing prefetch_related where needed without the user having to request it.

For context consider these three snippets using the Question & Choice models from the <a href="https://docs.djangoproject.com/en/1.11/intro/tutorial02/#creating-models" target="_blank" rel="nofollow" onmousedown="this.href='https://www.google.com/url?q\x3dhttps%3A%2F%2Fdocs.djangoproject.com%2Fen%2F1.11%2Fintro%2Ftutorial02%2F%23creating-models\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHF-kaWCchlNx159s33OFlJX4mp9A';return true;" onclick="this.href='https://www.google.com/url?q\x3dhttps%3A%2F%2Fdocs.djangoproject.com%2Fen%2F1.11%2Fintro%2Ftutorial02%2F%23creating-models\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHF-kaWCchlNx159s33OFlJX4mp9A';return true;" moz-do-not-send="true">tutorial when there are 100 questions each with 5 choices for a total of 500 choices.

Default
for choice in Choice.objects.all():
   
print(choice.question.question_text, ':', choice.choice_text)
501 db queries, fetches 500 choice rows and 500 question rows from the DB

Prefetch_related
for choice in Choice.objects.prefetch_related('question'):
   
print(choice.question.question_text, ':', choice.choice_text)
2 db queries, fetches 500 choice rows and 100 question rows from the DB

Select_related
for choice in Choice.objects.select_related('question'):
   
print(choice.question.question_text, ':', choice.choice_text)
1 db query, fetches 500 choice rows and 500 question rows from the DB

I've included select_related for completeness, I'm not going to propose changing anything about it's use. There are places where it is the best choice and in those places it will still be up to the user to request it. I will note that anywhere select_related is optimal prefetch_related is still better than the default and leave it at that.

The 'Default' example above is a classic example of the N+1 query problem, a problem that is widespread in Django apps.
This pattern of queries is what new users produce because they don't know enough about the database and / or ORM to do otherwise.
Experieced users will also often produce this because it's not always obvious what fields will and won't be used and subsequently what should be prefetched.
Additionally that list will change over time. A small change to a template to display an extra field can result in a denial of service on your DB due to a missing prefetch.
Identifying missing prefetches is fiddly, time consuming and error prone. Tools like <a href="https://github.com/YPlan/django-perf-rec" target="_blank" rel="nofollow" onmousedown="this.href='https://www.google.com/url?q\x3dhttps%3A%2F%2Fgithub.com%2FYPlan%2Fdjango-perf-rec\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEgakhDw2GUDP6X96h6Fc0TQot45A';return true;" onclick="this.href='https://www.google.com/url?q\x3dhttps%3A%2F%2Fgithub.com%2FYPlan%2Fdjango-perf-rec\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEgakhDw2GUDP6X96h6Fc0TQot45A';return true;" moz-do-not-send="true">django-perf-rec (which I was involved in creating) and <a href="https://github.com/jmcarp/nplusone" target="_blank" rel="nofollow" onmousedown="this.href='https://www.google.com/url?q\x3dhttps%3A%2F%2Fgithub.com%2Fjmcarp%2Fnplusone\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNH88BgHd_7YWTnFw0EyrpYYjRvwuQ';return true;" onclick="this.href='https://www.google.com/url?q\x3dhttps%3A%2F%2Fgithub.com%2Fjmcarp%2Fnplusone\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNH88BgHd_7YWTnFw0EyrpYYjRvwuQ';return true;" moz-do-not-send="true">nplusone exist in part to flag missing prefetches introduced by changed code.
Finally libraries like Django Rest Framework and the Admin will also produce queries like this because it's very difficult for them to know what needs prefetching without being explicitly told by an experienced user.

As hinted at the top I'd like to propose changing Django so the default code behaves like the prefetch_related code.
Longer term I think this should be the default behaviour but obviously it needs to be proved first so for now I'd suggest a new queryset function that enables this behaviour.

I have a proof of concept of this mechanism that I've used successfully in production. I'm not posting it yet because I'd like to focus on desired behavior rather than implementation details. But in summary, what it does is when accessing a missing field on a model, rather than fetching it just for that instance, it runs a prefetch_related query to fetch it for all peer instances that were fetched in the same queryset. So in the example above it prefetches all Questions in one query.

This might seem like a risky thing to do but I'd argue that it really isn't.
The only time this isn't superior to the default case is when you are post filtering the queryset results in Python.
Even in that case it's only inferior if you started with a large number of results, filtered basically all of them and the code is structured so that the filtered ones aren't garbage collected.
To cover this rare case the automatic prefetching can easily be disabled on a per queryset or per object basis. Leaving us with a rare downside that can easily be manually resolved in exchange for a significant general improvement.

In practice this thing is almost magical to work with. Unless you already have extensive and tightly maintained prefetches everywhere you get an immediate boost to virtually everything that touches the database, often knocking orders of magnitude off page load times.

If an agreement can be reached on pursuing this then I'm happy to put in the work to productize the proof of concept.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="rSKsqkhpCAAJ" rel="nofollow" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;" moz-do-not-send="true">django-develop...@googlegroups.com.
To post to this group, send email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="rSKsqkhpCAAJ" rel="nofollow" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;" moz-do-not-send="true">django-d...@googlegroups.com.
Visit this group at <a href="https://groups.google.com/group/django-developers" target="_blank" rel="nofollow" onmousedown="this.href='https://groups.google.com/group/django-developers';return true;" onclick="this.href='https://groups.google.com/group/django-developers';return true;" moz-do-not-send="true">https://groups.google.com/group/django-developers.
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/django-developers/d402bf30-a5af-4072-8b50-85e921f7f9af%40googlegroups.com?utm_medium=email&amp;utm_source=footer" target="_blank" rel="nofollow" onmousedown="this.href='https://groups.google.com/d/msgid/django-developers/d402bf30-a5af-4072-8b50-85e921f7f9af%40googlegroups.com?utm_medium\x3demail\x26utm_source\x3dfooter';return true;" onclick="this.href='https://groups.google.com/d/msgid/django-developers/d402bf30-a5af-4072-8b50-85e921f7f9af%40googlegroups.com?utm_medium\x3demail\x26utm_source\x3dfooter';return true;" moz-do-not-send="true">https://groups.google.com/d/msgid/django-developers/d402bf30-a5af-4072-8b50-85e921f7f9af%40googlegroups.com.
For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" rel="nofollow" onmousedown="this.href='https://groups.google.com/d/optout';return true;" onclick="this.href='https://groups.google.com/d/optout';return true;" moz-do-not-send="true">https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="rSKsqkhpCAAJ" rel="nofollow" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;" moz-do-not-send="true">django-develop...@googlegroups.com.
To post to this group, send email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="rSKsqkhpCAAJ" rel="nofollow" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;" moz-do-not-send="true">django-d...@googlegroups.com.
Visit this group at <a href="https://groups.google.com/group/django-developers" target="_blank" rel="nofollow" onmousedown="this.href='https://groups.google.com/group/django-developers';return true;" onclick="this.href='https://groups.google.com/group/django-developers';return true;" moz-do-not-send="true">https://groups.google.com/group/django-developers.
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/django-developers/CAMwjO1Gaha-K7KkefJkiS3LRdXvaPPwBeuKmhQv6bJFx3dty3w%40mail.gmail.com?utm_medium=email&amp;utm_source=footer" target="_blank" rel="nofollow" onmousedown="this.href='https://groups.google.com/d/msgid/django-developers/CAMwjO1Gaha-K7KkefJkiS3LRdXvaPPwBeuKmhQv6bJFx3dty3w%40mail.gmail.com?utm_medium\x3demail\x26utm_source\x3dfooter';return true;" onclick="this.href='https://groups.google.com/d/msgid/django-developers/CAMwjO1Gaha-K7KkefJkiS3LRdXvaPPwBeuKmhQv6bJFx3dty3w%40mail.gmail.com?utm_medium\x3demail\x26utm_source\x3dfooter';return true;" moz-do-not-send="true">https://groups.google.com/d/msgid/django-developers/CAMwjO1Gaha-K7KkefJkiS3LRdXvaPPwBeuKmhQv6bJFx3dty3w%40mail.gmail.com.
For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" rel="nofollow" onmousedown="this.href='https://groups.google.com/d/optout';return true;" onclick="this.href='https://groups.google.com/d/optout';return true;" moz-do-not-send="true">https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="rSKsqkhpCAAJ" rel="nofollow" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;" moz-do-not-send="true">django-develop...@googlegroups.com.
To post to this group, send email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="rSKsqkhpCAAJ" rel="nofollow" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;" moz-do-not-send="true">django-d...@googlegroups.com.
Visit this group at <a href="https://groups.google.com/group/django-developers" target="_blank" rel="nofollow" onmousedown="this.href='https://groups.google.com/group/django-developers';return true;" onclick="this.href='https://groups.google.com/group/django-developers';return true;" moz-do-not-send="true">https://groups.google.com/group/django-developers.
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/django-developers/a5780df6-ce60-05ae-88e3-997e6bc88f5c%40cantab.net?utm_medium=email&amp;utm_source=footer" target="_blank" rel="nofollow" onmousedown="this.href='https://groups.google.com/d/msgid/django-developers/a5780df6-ce60-05ae-88e3-997e6bc88f5c%40cantab.net?utm_medium\x3demail\x26utm_source\x3dfooter';return true;" onclick="this.href='https://groups.google.com/d/msgid/django-developers/a5780df6-ce60-05ae-88e3-997e6bc88f5c%40cantab.net?utm_medium\x3demail\x26utm_source\x3dfooter';return true;" moz-do-not-send="true">https://groups.google.com/d/msgid/django-developers/a5780df6-ce60-05ae-88e3-997e6bc88f5c%40cantab.net.
For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" rel="nofollow" onmousedown="this.href='https://groups.google.com/d/optout';return true;" onclick="this.href='https://groups.google.com/d/optout';return true;" moz-do-not-send="true">https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/2f0b5932-1a38-4eaf-84aa-13960a303141%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/29904574-c7f4-e71c-6c4e-270075cb7327%40brice.xyz.
For more options, visit https://groups.google.com/d/optout.
123