Quantcast

Questions on Django queryset iterator - wrt select_related and prefetch_related and how it works

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Questions on Django queryset iterator - wrt select_related and prefetch_related and how it works

Web Architect
Hi,

Could someone please let me know what the implications of Django queryset iterator on select_related and prefetch_related? 

Also, I am still not quite clear on the concept of iterator which I understand returns a Generator. Whenever a for loop is run on the Generator, the DB is queried for each element in the for loop - if my understanding is correct. The result of the Query is not stored in the memory. So, for some model A,

qs = A.objects.all() which probably does 'Select "all columns/fields'" from A in some order". This would probably fetch the results in one go. I am not sure how the iterator() changes this. 

BTW I observed that the iterator doesn't work like a typical Generator. Repeated call with next() on the Generator produces the same value. 

Would appreciate if someone could explain the above or provide any reference.

Thanks.

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/252d11a2-6e49-4c5a-b466-e186cf7254af%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Questions on Django queryset iterator - wrt select_related and prefetch_related and how it works

Shawn Milochik-2
I think the benefit of using the iterator is best explained by an example:

Without iterator:

You loop through the queryset, using each item for  whatever you're doing. As you do this, all the items are now in your local scope, using up RAM. If, after the loop, you should want to loop through the data again, you can. Upside: Can re-use the data. Downside: memory usage.

With iterator:

You loop through the queryset, using each item for  whatever you're doing. As you do this, read items are garbage-collected. If you want to loop through the data again, you'll have to hit the database again. Upside: Memory usage.




--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/CAOzwKwGuO_Yt8ir3XaB9BaY%3DF7Y2R87tGKx6ArAtiFmL0qzDFw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Questions on Django queryset iterator - wrt select_related and prefetch_related and how it works

Web Architect
Hi,

Thanks for your response. But I have observed the following:

Without Iterator: It takes a bit of a time before the for loop is executed and also the CPU spikes up during that period and so does the Memory - which implies the DB is accessed to fetch all the results.

With iterator: The for loop execution starts immediately and the memory usage is also low. This probably implies that not all the results are fetched with a single query. 

Based on what you have mentioned, I am not sure how to understand the above behaviour. 

Thanks,

On Friday, March 17, 2017 at 11:27:52 AM UTC+5:30, Shawn Milochik wrote:
I think the benefit of using the iterator is best explained by an example:

Without iterator:

You loop through the queryset, using each item for  whatever you're doing. As you do this, all the items are now in your local scope, using up RAM. If, after the loop, you should want to loop through the data again, you can. Upside: Can re-use the data. Downside: memory usage.

With iterator:

You loop through the queryset, using each item for  whatever you're doing. As you do this, read items are garbage-collected. If you want to loop through the data again, you'll have to hit the database again. Upside: Memory usage.




--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/aa3da7de-a700-48fc-bf30-1c41d11b1d56%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Questions on Django queryset iterator - wrt select_related and prefetch_related and how it works

knbk
Django uses client-side cursors. 1.11, which is currently in beta, switches to server-side cursors on PostgreSQL [1], but other databases still use client-side cursors. When a client-side cursor executes a query, it loads all results in memory before the results can be accessed.

But that's just the raw results. Without iterator(), these raw results are immediately converted to model instances, and related objects that have been loaded with select_related() or prefetch_related are converted to model instances as well. This can cause a spike in CPU and memory usage. When using iterator(), most of the CPU usage is in the database itself, and the raw results use quite a bit less memory than model instances. The CPU resources needed to convert the raw results to model instances is spread out in the loop iterations, and the model instances can in most situations be garbage-collected after the iteration moves on the next instance. 


[1] https://docs.djangoproject.com/en/1.11/releases/1.11/#database-backends (third item)

On Friday, March 17, 2017 at 10:47:57 AM UTC+1, Web Architect wrote:
Hi,

Thanks for your response. But I have observed the following:

Without Iterator: It takes a bit of a time before the for loop is executed and also the CPU spikes up during that period and so does the Memory - which implies the DB is accessed to fetch all the results.

With iterator: The for loop execution starts immediately and the memory usage is also low. This probably implies that not all the results are fetched with a single query. 

Based on what you have mentioned, I am not sure how to understand the above behaviour. 

Thanks,

On Friday, March 17, 2017 at 11:27:52 AM UTC+5:30, Shawn Milochik wrote:
I think the benefit of using the iterator is best explained by an example:

Without iterator:

You loop through the queryset, using each item for  whatever you're doing. As you do this, all the items are now in your local scope, using up RAM. If, after the loop, you should want to loop through the data again, you can. Upside: Can re-use the data. Downside: memory usage.

With iterator:

You loop through the queryset, using each item for  whatever you're doing. As you do this, read items are garbage-collected. If you want to loop through the data again, you'll have to hit the database again. Upside: Memory usage.




--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/9bc7ce18-6b20-4329-8479-20f02f3c6c8a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Loading...