Quantcast

Django queryset High CPU Usage

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
23 messages Options
12
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Django queryset High CPU Usage

Web Architect
Hi,

I am a bit perplexed by this and not sure what the solution is. Following is the scenario:

There is a Model A with 10000 records. Just a simple queryset - A.objects.all() is resulting in CPU hitting almost 100%. Is there a way to optimize this? But why would such a query result in high CPU Usage?

Would appreciate if anyone could throw some light on the above.

Thanks.

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/0c5f0c9f-4f7a-446e-b0bd-fa8815382382%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Django queryset High CPU Usage

Web Architect
Would like to further add - the python CPU Usage is hitting almost 100 %. When I run  a Select * query on Mysql, its quite fast and CPU is normal. I am not sure if anything more needs to be done in Django. 

On Friday, March 10, 2017 at 10:55:51 AM UTC+5:30, Web Architect wrote:
Hi,

I am a bit perplexed by this and not sure what the solution is. Following is the scenario:

There is a Model A with 10000 records. Just a simple queryset - A.objects.all() is resulting in CPU hitting almost 100%. Is there a way to optimize this? But why would such a query result in high CPU Usage?

Would appreciate if anyone could throw some light on the above.

Thanks.

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/e0cc58b8-f44b-4745-be46-fe7b52503b74%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Django queryset High CPU Usage

Melvyn Sopacua
In reply to this post by Web Architect

On Thursday 09 March 2017 21:25:51 Web Architect wrote:

 

> I am a bit perplexed by this and not sure what the solution is.

> Following is the scenario:

>

> There is a Model A with 10000 records. Just a simple queryset -

> A.objects.all() is resulting in CPU hitting almost 100%.

 

What's the problem? You have a fast db, fast network and requesting 10k records into memory. Why would you want CPU usage to be lower so that it takes longer?

The question you need to ask yourself is why you need 10k records. Nobody's gonna read them all.

--

Melvyn Sopacua

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/7928487.AGXE0cLiAx%40devstation.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Django queryset High CPU Usage

James Schneider
In reply to this post by Web Architect


On Mar 9, 2017 9:37 PM, "Web Architect" <[hidden email]> wrote:
Would like to further add - the python CPU Usage is hitting almost 100 %. When I run  a Select * query on Mysql, its quite fast and CPU is normal. I am not sure if anything more needs to be done in Django. 

Ironically, things being done in Django is the reason for your CPU utilization issue in the first place.

Calling a qs.all() is NOT the same as a SELECT * statement, even more so when speaking to the scale of query that you mention.

Your SQL query is simply listing data in a table. A very easy thing to do, hence the reason it runs quickly.

The qs.all() call is also running the same query (probably). However, in addition to pulling all of the data, it is performing a transformation of that data in to Django model objects. If you are pulling 10K items, then Django is creating 10K objects, which is easily more intensive than a raw SQL query, even for simple model objects. 

In general, there's usually no practical reason to ever pull that many objects from a DB for display on a page. Filter down to a reasonable number (<100 for almost all sane cases) or implement a paging system to limit returned results. It's also probably using a ton of RAM only to be immediately thrown away at the end of the request. Browsers will disintegrate trying to render that many HTML elements simultaneously.

Look at implementing a paging system, possibly through Django's built-in mechanism, or something like Datatables and the infinite scroll plugin.



-James

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/CA%2Be%2BciWjdK7wWKSUpyN-aR1b%3DkObuuerBYi85oTxCHMjrUstzg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Django queryset High CPU Usage

Web Architect
Hi James,

Thanks for your response. Melvyn also posed a similar point of not loading the whole records. 

But all the records are needed for reporting purposes - where the data is read from the DB and a csv report is created. I am not quite an expert on Django but I am not sure if there is a better way to do it. 

The scenario is as follows to make it clearer:

Ours is an ecommerce site built on Django. Our admin/accounting team needs to download reports now and then. We have a Django model for the line items purchased. Now there could be 10k line items sold and each line items are associated with other models like payments, shipments etc which is a complex set of relations. 

We do not yet have a sophisticated reporting mechanism but was working on building a simplistic reporting system on Django. But I am finding issues with scaling up - as reported with CPU Usage and the amount of time taken. If there is a way to optimise this - would be great otherwise we might not have to look for standard methods of reporting tools. 

Would appreciate suggestions/advices on the above.

Thanks,

On Friday, March 10, 2017 at 2:52:50 PM UTC+5:30, James Schneider wrote:


On Mar 9, 2017 9:37 PM, "Web Architect" <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="p_r6Wv3fAgAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">pina...@...> wrote:
Would like to further add - the python CPU Usage is hitting almost 100 %. When I run  a Select * query on Mysql, its quite fast and CPU is normal. I am not sure if anything more needs to be done in Django. 

Ironically, things being done in Django is the reason for your CPU utilization issue in the first place.

Calling a qs.all() is NOT the same as a SELECT * statement, even more so when speaking to the scale of query that you mention.

Your SQL query is simply listing data in a table. A very easy thing to do, hence the reason it runs quickly.

The qs.all() call is also running the same query (probably). However, in addition to pulling all of the data, it is performing a transformation of that data in to Django model objects. If you are pulling 10K items, then Django is creating 10K objects, which is easily more intensive than a raw SQL query, even for simple model objects. 

In general, there's usually no practical reason to ever pull that many objects from a DB for display on a page. Filter down to a reasonable number (<100 for almost all sane cases) or implement a paging system to limit returned results. It's also probably using a ton of RAM only to be immediately thrown away at the end of the request. Browsers will disintegrate trying to render that many HTML elements simultaneously.

Look at implementing a paging system, possibly through Django's built-in mechanism, or something like Datatables and the infinite scroll plugin.

<a href="https://docs.djangoproject.com/en/dev/topics/pagination/" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fdocs.djangoproject.com%2Fen%2Fdev%2Ftopics%2Fpagination%2F\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEkN7hUA9IT158zJyNhum0xbywN3w&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fdocs.djangoproject.com%2Fen%2Fdev%2Ftopics%2Fpagination%2F\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEkN7hUA9IT158zJyNhum0xbywN3w&#39;;return true;">https://docs.djangoproject.com/en/dev/topics/pagination/

<a href="https://datatables.net/extensions/scroller/" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fdatatables.net%2Fextensions%2Fscroller%2F\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHSgfySjLvJN76tc2nme7HWNn5gQw&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fdatatables.net%2Fextensions%2Fscroller%2F\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHSgfySjLvJN76tc2nme7HWNn5gQw&#39;;return true;">https://datatables.net/extensions/scroller/

-James

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/566cf05e-babf-456c-91fa-a698f7c7537d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Django queryset High CPU Usage

James Bennett
If all you need is to export data from your database (with or without transforming it along the way) to a CSV, using the normal QuerySet methods is probably the wrong approach; you don't need model objects to do that. Some options include:

* Use raw SQL to query for the data and push it to CSV (also, some databases natively understand how to export query results to CSV)
* Use the values() or values_list() methods to use lighter-weight basic Python data structures (dictionaries and lists) instead of model objects
* If you *must* instantiate model objects, use the iterator() method to avoid keeping them around in-memory, and look at server-side cursors as an option
* If you're fetching related data, make sure you're eager-loading the relations to avoid N+1 problems.


On Fri, Mar 10, 2017 at 3:06 AM, Web Architect <[hidden email]> wrote:
Hi James,

Thanks for your response. Melvyn also posed a similar point of not loading the whole records. 

But all the records are needed for reporting purposes - where the data is read from the DB and a csv report is created. I am not quite an expert on Django but I am not sure if there is a better way to do it. 

The scenario is as follows to make it clearer:

Ours is an ecommerce site built on Django. Our admin/accounting team needs to download reports now and then. We have a Django model for the line items purchased. Now there could be 10k line items sold and each line items are associated with other models like payments, shipments etc which is a complex set of relations. 

We do not yet have a sophisticated reporting mechanism but was working on building a simplistic reporting system on Django. But I am finding issues with scaling up - as reported with CPU Usage and the amount of time taken. If there is a way to optimise this - would be great otherwise we might not have to look for standard methods of reporting tools. 

Would appreciate suggestions/advices on the above.

Thanks,

On Friday, March 10, 2017 at 2:52:50 PM UTC+5:30, James Schneider wrote:


On Mar 9, 2017 9:37 PM, "Web Architect" <[hidden email]> wrote:
Would like to further add - the python CPU Usage is hitting almost 100 %. When I run  a Select * query on Mysql, its quite fast and CPU is normal. I am not sure if anything more needs to be done in Django. 

Ironically, things being done in Django is the reason for your CPU utilization issue in the first place.

Calling a qs.all() is NOT the same as a SELECT * statement, even more so when speaking to the scale of query that you mention.

Your SQL query is simply listing data in a table. A very easy thing to do, hence the reason it runs quickly.

The qs.all() call is also running the same query (probably). However, in addition to pulling all of the data, it is performing a transformation of that data in to Django model objects. If you are pulling 10K items, then Django is creating 10K objects, which is easily more intensive than a raw SQL query, even for simple model objects. 

In general, there's usually no practical reason to ever pull that many objects from a DB for display on a page. Filter down to a reasonable number (<100 for almost all sane cases) or implement a paging system to limit returned results. It's also probably using a ton of RAM only to be immediately thrown away at the end of the request. Browsers will disintegrate trying to render that many HTML elements simultaneously.

Look at implementing a paging system, possibly through Django's built-in mechanism, or something like Datatables and the infinite scroll plugin.



-James

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/566cf05e-babf-456c-91fa-a698f7c7537d%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/CAL13Cg9k7DhTKEgYi%3DB59NLZck2oP6gzu_Bjckz%3D_wqYwYA%3D3Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Django queryset High CPU Usage

Melvyn Sopacua
In reply to this post by Web Architect

On Friday 10 March 2017 03:06:12 Web Architect wrote:

> Hi James,

>

> Thanks for your response. Melvyn also posed a similar point of not

> loading the whole records.

>

> But all the records are needed for reporting purposes - where the data

> is read from the DB and a csv report is created. I am not quite an

> expert on Django but I am not sure if there is a better way to do it.

>

> The scenario is as follows to make it clearer:

>

> Ours is an ecommerce site built on Django. Our admin/accounting team

> needs to download reports now and then. We have a Django model for

> the line items purchased. Now there could be 10k line items sold and

> each line items are associated with other models like payments,

> shipments etc which is a complex set of relations.

 

The most scalable solution is to not send the CSV to the browser and not do it at the webserver.

Use some tasking system like Celery to generate the report at a different server. Use a management command to do it. Then mail the report or make it available as static file via rsync/ssh/whathavyou.

 

You get bonus points for setting up the report generating server with a read-only slave of the database.

 

This scales much better and doesn't tie up webserver resources.

--

Melvyn Sopacua

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/1640433.aR1v7dkvre%40devstation.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Django queryset High CPU Usage

Web Architect
In reply to this post by James Bennett
Hi James,

Thanks for the clarification. Much appreciated. Will follow your points for the reporting part considering the overheads in ORM. 

Thanks,

On Friday, March 10, 2017 at 4:55:35 PM UTC+5:30, James Bennett wrote:
If all you need is to export data from your database (with or without transforming it along the way) to a CSV, using the normal QuerySet methods is probably the wrong approach; you don't need model objects to do that. Some options include:

* Use raw SQL to query for the data and push it to CSV (also, some databases natively understand how to export query results to CSV)
* Use the values() or values_list() methods to use lighter-weight basic Python data structures (dictionaries and lists) instead of model objects
* If you *must* instantiate model objects, use the iterator() method to avoid keeping them around in-memory, and look at server-side cursors as an option
* If you're fetching related data, make sure you're eager-loading the relations to avoid N+1 problems.


On Fri, Mar 10, 2017 at 3:06 AM, Web Architect <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="Ud_VHbDmAgAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">pina...@...> wrote:
Hi James,

Thanks for your response. Melvyn also posed a similar point of not loading the whole records. 

But all the records are needed for reporting purposes - where the data is read from the DB and a csv report is created. I am not quite an expert on Django but I am not sure if there is a better way to do it. 

The scenario is as follows to make it clearer:

Ours is an ecommerce site built on Django. Our admin/accounting team needs to download reports now and then. We have a Django model for the line items purchased. Now there could be 10k line items sold and each line items are associated with other models like payments, shipments etc which is a complex set of relations. 

We do not yet have a sophisticated reporting mechanism but was working on building a simplistic reporting system on Django. But I am finding issues with scaling up - as reported with CPU Usage and the amount of time taken. If there is a way to optimise this - would be great otherwise we might not have to look for standard methods of reporting tools. 

Would appreciate suggestions/advices on the above.

Thanks,

On Friday, March 10, 2017 at 2:52:50 PM UTC+5:30, James Schneider wrote:


On Mar 9, 2017 9:37 PM, "Web Architect" <[hidden email]> wrote:
Would like to further add - the python CPU Usage is hitting almost 100 %. When I run  a Select * query on Mysql, its quite fast and CPU is normal. I am not sure if anything more needs to be done in Django. 

Ironically, things being done in Django is the reason for your CPU utilization issue in the first place.

Calling a qs.all() is NOT the same as a SELECT * statement, even more so when speaking to the scale of query that you mention.

Your SQL query is simply listing data in a table. A very easy thing to do, hence the reason it runs quickly.

The qs.all() call is also running the same query (probably). However, in addition to pulling all of the data, it is performing a transformation of that data in to Django model objects. If you are pulling 10K items, then Django is creating 10K objects, which is easily more intensive than a raw SQL query, even for simple model objects. 

In general, there's usually no practical reason to ever pull that many objects from a DB for display on a page. Filter down to a reasonable number (<100 for almost all sane cases) or implement a paging system to limit returned results. It's also probably using a ton of RAM only to be immediately thrown away at the end of the request. Browsers will disintegrate trying to render that many HTML elements simultaneously.

Look at implementing a paging system, possibly through Django's built-in mechanism, or something like Datatables and the infinite scroll plugin.

<a href="https://docs.djangoproject.com/en/dev/topics/pagination/" rel="nofollow" target="_blank" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fdocs.djangoproject.com%2Fen%2Fdev%2Ftopics%2Fpagination%2F\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEkN7hUA9IT158zJyNhum0xbywN3w&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fdocs.djangoproject.com%2Fen%2Fdev%2Ftopics%2Fpagination%2F\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEkN7hUA9IT158zJyNhum0xbywN3w&#39;;return true;">https://docs.djangoproject.com/en/dev/topics/pagination/

<a href="https://datatables.net/extensions/scroller/" rel="nofollow" target="_blank" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fdatatables.net%2Fextensions%2Fscroller%2F\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHSgfySjLvJN76tc2nme7HWNn5gQw&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fdatatables.net%2Fextensions%2Fscroller%2F\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHSgfySjLvJN76tc2nme7HWNn5gQw&#39;;return true;">https://datatables.net/extensions/scroller/

-James

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="Ud_VHbDmAgAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">django-users...@googlegroups.com.
To post to this group, send email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="Ud_VHbDmAgAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">django...@....
Visit this group at <a href="https://groups.google.com/group/django-users" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://groups.google.com/group/django-users&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/group/django-users&#39;;return true;">https://groups.google.com/group/django-users.
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/django-users/566cf05e-babf-456c-91fa-a698f7c7537d%40googlegroups.com?utm_medium=email&amp;utm_source=footer" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://groups.google.com/d/msgid/django-users/566cf05e-babf-456c-91fa-a698f7c7537d%40googlegroups.com?utm_medium\x3demail\x26utm_source\x3dfooter&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/msgid/django-users/566cf05e-babf-456c-91fa-a698f7c7537d%40googlegroups.com?utm_medium\x3demail\x26utm_source\x3dfooter&#39;;return true;">https://groups.google.com/d/msgid/django-users/566cf05e-babf-456c-91fa-a698f7c7537d%40googlegroups.com.

For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;">https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/2f573688-b141-4b07-9dc6-71806309d6f3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Django queryset High CPU Usage

Vijay Khemlani
In reply to this post by James Bennett
There is something wrong in your setup

I can query a 400.000 item table in less than a second with the
typical "Model.objects.all()". Django does not convert all of the
entries into objects just by that query.

You don't have any managers, or anything that can result in a
side-effect beyond the query itself?



On 3/10/17, James Bennett <[hidden email]> wrote:

> If all you need is to export data from your database (with or without
> transforming it along the way) to a CSV, using the normal QuerySet methods
> is probably the wrong approach; you don't need model objects to do that.
> Some options include:
>
> * Use raw SQL to query for the data and push it to CSV (also, some
> databases natively understand how to export query results to CSV)
> * Use the values() or values_list() methods to use lighter-weight basic
> Python data structures (dictionaries and lists) instead of model objects
> * If you *must* instantiate model objects, use the iterator() method to
> avoid keeping them around in-memory, and look at server-side cursors as an
> option
> * If you're fetching related data, make sure you're eager-loading the
> relations to avoid N+1 problems.
>
>
> On Fri, Mar 10, 2017 at 3:06 AM, Web Architect <[hidden email]> wrote:
>
>> Hi James,
>>
>> Thanks for your response. Melvyn also posed a similar point of not
>> loading
>> the whole records.
>>
>> But all the records are needed for reporting purposes - where the data is
>> read from the DB and a csv report is created. I am not quite an expert on
>> Django but I am not sure if there is a better way to do it.
>>
>> The scenario is as follows to make it clearer:
>>
>> Ours is an ecommerce site built on Django. Our admin/accounting team
>> needs
>> to download reports now and then. We have a Django model for the line
>> items
>> purchased. Now there could be 10k line items sold and each line items are
>> associated with other models like payments, shipments etc which is a
>> complex set of relations.
>>
>> We do not yet have a sophisticated reporting mechanism but was working on
>> building a simplistic reporting system on Django. But I am finding issues
>> with scaling up - as reported with CPU Usage and the amount of time
>> taken.
>> If there is a way to optimise this - would be great otherwise we might
>> not
>> have to look for standard methods of reporting tools.
>>
>> Would appreciate suggestions/advices on the above.
>>
>> Thanks,
>>
>> On Friday, March 10, 2017 at 2:52:50 PM UTC+5:30, James Schneider wrote:
>>>
>>>
>>>
>>> On Mar 9, 2017 9:37 PM, "Web Architect" <[hidden email]> wrote:
>>>
>>> Would like to further add - the python CPU Usage is hitting almost 100
>>> %.
>>> When I run  a Select * query on Mysql, its quite fast and CPU is normal.
>>> I
>>> am not sure if anything more needs to be done in Django.
>>>
>>>
>>> Ironically, things being done in Django is the reason for your CPU
>>> utilization issue in the first place.
>>>
>>> Calling a qs.all() is NOT the same as a SELECT * statement, even more so
>>> when speaking to the scale of query that you mention.
>>>
>>> Your SQL query is simply listing data in a table. A very easy thing to
>>> do, hence the reason it runs quickly.
>>>
>>> The qs.all() call is also running the same query (probably). However, in
>>> addition to pulling all of the data, it is performing a transformation
>>> of
>>> that data in to Django model objects. If you are pulling 10K items, then
>>> Django is creating 10K objects, which is easily more intensive than a
>>> raw
>>> SQL query, even for simple model objects.
>>>
>>> In general, there's usually no practical reason to ever pull that many
>>> objects from a DB for display on a page. Filter down to a reasonable
>>> number
>>> (<100 for almost all sane cases) or implement a paging system to limit
>>> returned results. It's also probably using a ton of RAM only to be
>>> immediately thrown away at the end of the request. Browsers will
>>> disintegrate trying to render that many HTML elements simultaneously.
>>>
>>> Look at implementing a paging system, possibly through Django's built-in
>>> mechanism, or something like Datatables and the infinite scroll plugin.
>>>
>>> https://docs.djangoproject.com/en/dev/topics/pagination/
>>>
>>> https://datatables.net/extensions/scroller/
>>>
>>> -James
>>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Django users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [hidden email].
>> To post to this group, send email to [hidden email].
>> Visit this group at https://groups.google.com/group/django-users.
>> To view this discussion on the web visit https://groups.google.com/d/
>> msgid/django-users/566cf05e-babf-456c-91fa-a698f7c7537d%40googlegroups.com
>> <https://groups.google.com/d/msgid/django-users/566cf05e-babf-456c-91fa-a698f7c7537d%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Django users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [hidden email].
> To post to this group, send email to [hidden email].
> Visit this group at https://groups.google.com/group/django-users.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-users/CAL13Cg9k7DhTKEgYi%3DB59NLZck2oP6gzu_Bjckz%3D_wqYwYA%3D3Q%40mail.gmail.com.
> For more options, visit https://groups.google.com/d/optout.
>

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/CALn3ei2JCmSX_cxQZksFy3gFMKJubei0WJTppZEFaMLbvnxG8Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Django queryset High CPU Usage

Web Architect
In reply to this post by Melvyn Sopacua
Hi Melvyn,

Thanks for your response.

We are using the Celery for all tasks which could be done offline or separately. Report generation is one of it. We trigger the task from Browser but the report creation is run via Celery which when completed sends the csv filepath to the browser to be downloaded. 

But the CPU usage and time taken are high. Though right now we are using a single multicore HW. Since I am not quite adept in Django, hence, I raised the query if there is a way to optimise. I have used defer, only etc to lighten the query but still its taking time and CPU. But seems like ORM is a bottle-neck. 

I agree it's prudent to have the reporting on a separate HW. Would consider it.

Thanks.

On Friday, March 10, 2017 at 5:25:39 PM UTC+5:30, Melvyn Sopacua wrote:

On Friday 10 March 2017 03:06:12 Web Architect wrote:

> Hi James,

>

> Thanks for your response. Melvyn also posed a similar point of not

> loading the whole records.

>

> But all the records are needed for reporting purposes - where the data

> is read from the DB and a csv report is created. I am not quite an

> expert on Django but I am not sure if there is a better way to do it.

>

> The scenario is as follows to make it clearer:

>

> Ours is an ecommerce site built on Django. Our admin/accounting team

> needs to download reports now and then. We have a Django model for

> the line items purchased. Now there could be 10k line items sold and

> each line items are associated with other models like payments,

> shipments etc which is a complex set of relations.

 

The most scalable solution is to not send the CSV to the browser and not do it at the webserver.

Use some tasking system like Celery to generate the report at a different server. Use a management command to do it. Then mail the report or make it available as static file via rsync/ssh/whathavyou.

 

You get bonus points for setting up the report generating server with a read-only slave of the database.

 

This scales much better and doesn't tie up webserver resources.

--

Melvyn Sopacua

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/a9efcf50-7638-47fd-a4a8-cd04192c6af3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Django queryset High CPU Usage

Web Architect
In reply to this post by Vijay Khemlani
Hi,

Thanks for your response. 

Well, its not just the queryset but when the query triggered by an if condition or a for loop - thats what I meant. I was doing some basic performance check as well while working on the report generation - I observed the issue. There's no manager other than the default one. The CPU usage spikes and it takes some time before the Objects are created. 

But when you say not all objects are created at one go, you mean they can be deferred? You mean the query results are stored in the memory till each record is accessed or something using the objects? Could you please clarify?

Thanks. 

On Friday, March 10, 2017 at 5:31:14 PM UTC+5:30, Vijay Khemlani wrote:
There is something wrong in your setup

I can query a 400.000 item table in less than a second with the
typical "Model.objects.all()". Django does not convert all of the
entries into objects just by that query.

You don't have any managers, or anything that can result in a
side-effect beyond the query itself?



On 3/10/17, James Bennett <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="e83sKKLoAgAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">ubern...@...> wrote:

> If all you need is to export data from your database (with or without
> transforming it along the way) to a CSV, using the normal QuerySet methods
> is probably the wrong approach; you don't need model objects to do that.
> Some options include:
>
> * Use raw SQL to query for the data and push it to CSV (also, some
> databases natively understand how to export query results to CSV)
> * Use the values() or values_list() methods to use lighter-weight basic
> Python data structures (dictionaries and lists) instead of model objects
> * If you *must* instantiate model objects, use the iterator() method to
> avoid keeping them around in-memory, and look at server-side cursors as an
> option
> * If you're fetching related data, make sure you're eager-loading the
> relations to avoid N+1 problems.
>
>
> On Fri, Mar 10, 2017 at 3:06 AM, Web Architect <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="e83sKKLoAgAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">pina...@...> wrote:
>
>> Hi James,
>>
>> Thanks for your response. Melvyn also posed a similar point of not
>> loading
>> the whole records.
>>
>> But all the records are needed for reporting purposes - where the data is
>> read from the DB and a csv report is created. I am not quite an expert on
>> Django but I am not sure if there is a better way to do it.
>>
>> The scenario is as follows to make it clearer:
>>
>> Ours is an ecommerce site built on Django. Our admin/accounting team
>> needs
>> to download reports now and then. We have a Django model for the line
>> items
>> purchased. Now there could be 10k line items sold and each line items are
>> associated with other models like payments, shipments etc which is a
>> complex set of relations.
>>
>> We do not yet have a sophisticated reporting mechanism but was working on
>> building a simplistic reporting system on Django. But I am finding issues
>> with scaling up - as reported with CPU Usage and the amount of time
>> taken.
>> If there is a way to optimise this - would be great otherwise we might
>> not
>> have to look for standard methods of reporting tools.
>>
>> Would appreciate suggestions/advices on the above.
>>
>> Thanks,
>>
>> On Friday, March 10, 2017 at 2:52:50 PM UTC+5:30, James Schneider wrote:
>>>
>>>
>>>
>>> On Mar 9, 2017 9:37 PM, "Web Architect" <[hidden email]> wrote:
>>>
>>> Would like to further add - the python CPU Usage is hitting almost 100
>>> %.
>>> When I run  a Select * query on Mysql, its quite fast and CPU is normal.
>>> I
>>> am not sure if anything more needs to be done in Django.
>>>
>>>
>>> Ironically, things being done in Django is the reason for your CPU
>>> utilization issue in the first place.
>>>
>>> Calling a qs.all() is NOT the same as a SELECT * statement, even more so
>>> when speaking to the scale of query that you mention.
>>>
>>> Your SQL query is simply listing data in a table. A very easy thing to
>>> do, hence the reason it runs quickly.
>>>
>>> The qs.all() call is also running the same query (probably). However, in
>>> addition to pulling all of the data, it is performing a transformation
>>> of
>>> that data in to Django model objects. If you are pulling 10K items, then
>>> Django is creating 10K objects, which is easily more intensive than a
>>> raw
>>> SQL query, even for simple model objects.
>>>
>>> In general, there's usually no practical reason to ever pull that many
>>> objects from a DB for display on a page. Filter down to a reasonable
>>> number
>>> (<100 for almost all sane cases) or implement a paging system to limit
>>> returned results. It's also probably using a ton of RAM only to be
>>> immediately thrown away at the end of the request. Browsers will
>>> disintegrate trying to render that many HTML elements simultaneously.
>>>
>>> Look at implementing a paging system, possibly through Django's built-in
>>> mechanism, or something like Datatables and the infinite scroll plugin.
>>>
>>> <a href="https://docs.djangoproject.com/en/dev/topics/pagination/" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fdocs.djangoproject.com%2Fen%2Fdev%2Ftopics%2Fpagination%2F\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEkN7hUA9IT158zJyNhum0xbywN3w&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fdocs.djangoproject.com%2Fen%2Fdev%2Ftopics%2Fpagination%2F\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEkN7hUA9IT158zJyNhum0xbywN3w&#39;;return true;">https://docs.djangoproject.com/en/dev/topics/pagination/
>>>
>>> <a href="https://datatables.net/extensions/scroller/" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fdatatables.net%2Fextensions%2Fscroller%2F\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHSgfySjLvJN76tc2nme7HWNn5gQw&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fdatatables.net%2Fextensions%2Fscroller%2F\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHSgfySjLvJN76tc2nme7HWNn5gQw&#39;;return true;">https://datatables.net/extensions/scroller/
>>>
>>> -James
>>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Django users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="e83sKKLoAgAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">django-users...@googlegroups.com.
>> To post to this group, send email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="e83sKKLoAgAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">django...@....
>> Visit this group at <a href="https://groups.google.com/group/django-users" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://groups.google.com/group/django-users&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/group/django-users&#39;;return true;">https://groups.google.com/group/django-users.
>> To view this discussion on the web visit <a href="https://groups.google.com/d/" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://groups.google.com/d/&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/&#39;;return true;">https://groups.google.com/d/
>> msgid/django-users/566cf05e-babf-456c-91fa-a698f7c7537d%<a href="http://40googlegroups.com" target="_blank" rel="nofollow" onmousedown="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2F40googlegroups.com\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEjt8k9OhR10y0jYLX0ZgJix1hnBQ&#39;;return true;" onclick="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2F40googlegroups.com\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEjt8k9OhR10y0jYLX0ZgJix1hnBQ&#39;;return true;">40googlegroups.com
>> <<a href="https://groups.google.com/d/msgid/django-users/566cf05e-babf-456c-91fa-a698f7c7537d%40googlegroups.com?utm_medium=email&amp;utm_source=footer" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://groups.google.com/d/msgid/django-users/566cf05e-babf-456c-91fa-a698f7c7537d%40googlegroups.com?utm_medium\x3demail\x26utm_source\x3dfooter&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/msgid/django-users/566cf05e-babf-456c-91fa-a698f7c7537d%40googlegroups.com?utm_medium\x3demail\x26utm_source\x3dfooter&#39;;return true;">https://groups.google.com/d/msgid/django-users/566cf05e-babf-456c-91fa-a698f7c7537d%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>> For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;">https://groups.google.com/d/optout.
>>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Django users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="e83sKKLoAgAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">django-users...@googlegroups.com.
> To post to this group, send email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="e83sKKLoAgAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">django...@....
> Visit this group at <a href="https://groups.google.com/group/django-users" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://groups.google.com/group/django-users&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/group/django-users&#39;;return true;">https://groups.google.com/group/django-users.
> To view this discussion on the web visit
> <a href="https://groups.google.com/d/msgid/django-users/CAL13Cg9k7DhTKEgYi%3DB59NLZck2oP6gzu_Bjckz%3D_wqYwYA%3D3Q%40mail.gmail.com" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://groups.google.com/d/msgid/django-users/CAL13Cg9k7DhTKEgYi%3DB59NLZck2oP6gzu_Bjckz%3D_wqYwYA%3D3Q%40mail.gmail.com&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/msgid/django-users/CAL13Cg9k7DhTKEgYi%3DB59NLZck2oP6gzu_Bjckz%3D_wqYwYA%3D3Q%40mail.gmail.com&#39;;return true;">https://groups.google.com/d/msgid/django-users/CAL13Cg9k7DhTKEgYi%3DB59NLZck2oP6gzu_Bjckz%3D_wqYwYA%3D3Q%40mail.gmail.com.
> For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;">https://groups.google.com/d/optout.
>

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/165b08da-cc22-4dcb-90a8-6b35d02ac73f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Django queryset High CPU Usage

Vijay Khemlani
This is what you said

"There is a Model A with 10000 records. Just a simple queryset -
A.objects.all() is resulting in CPU hitting almost 100%. Is there a
way to optimize this? But why would such a query result in high CPU
Usage?"

By that, I assume that you opened a shell and executed

"A.objects.all()"

If you have other code that is being executed (your for loop or
something) you better post it to clarify your question.

On 3/10/17, Web Architect <[hidden email]> wrote:

> Hi,
>
> Thanks for your response.
>
> Well, its not just the queryset but when the query triggered by an if
> condition or a for loop - thats what I meant. I was doing some basic
> performance check as well while working on the report generation - I
> observed the issue. There's no manager other than the default one. The CPU
> usage spikes and it takes some time before the Objects are created.
>
> But when you say not all objects are created at one go, you mean they can
> be deferred? You mean the query results are stored in the memory till each
> record is accessed or something using the objects? Could you please
> clarify?
>
> Thanks.
>
> On Friday, March 10, 2017 at 5:31:14 PM UTC+5:30, Vijay Khemlani wrote:
>>
>> There is something wrong in your setup
>>
>> I can query a 400.000 item table in less than a second with the
>> typical "Model.objects.all()". Django does not convert all of the
>> entries into objects just by that query.
>>
>> You don't have any managers, or anything that can result in a
>> side-effect beyond the query itself?
>>
>>
>>
>> On 3/10/17, James Bennett <[hidden email] <javascript:>> wrote:
>> > If all you need is to export data from your database (with or without
>> > transforming it along the way) to a CSV, using the normal QuerySet
>> methods
>> > is probably the wrong approach; you don't need model objects to do that.
>> >
>> > Some options include:
>> >
>> > * Use raw SQL to query for the data and push it to CSV (also, some
>> > databases natively understand how to export query results to CSV)
>> > * Use the values() or values_list() methods to use lighter-weight basic
>> >
>> > Python data structures (dictionaries and lists) instead of model objects
>> >
>> > * If you *must* instantiate model objects, use the iterator() method to
>> >
>> > avoid keeping them around in-memory, and look at server-side cursors as
>> >
>> an
>> > option
>> > * If you're fetching related data, make sure you're eager-loading the
>> > relations to avoid N+1 problems.
>> >
>> >
>> > On Fri, Mar 10, 2017 at 3:06 AM, Web Architect <[hidden email]
>> <javascript:>> wrote:
>> >
>> >> Hi James,
>> >>
>> >> Thanks for your response. Melvyn also posed a similar point of not
>> >> loading
>> >> the whole records.
>> >>
>> >> But all the records are needed for reporting purposes - where the data
>> >>
>> is
>> >> read from the DB and a csv report is created. I am not quite an expert
>> >>
>> on
>> >> Django but I am not sure if there is a better way to do it.
>> >>
>> >> The scenario is as follows to make it clearer:
>> >>
>> >> Ours is an ecommerce site built on Django. Our admin/accounting team
>> >> needs
>> >> to download reports now and then. We have a Django model for the line
>> >> items
>> >> purchased. Now there could be 10k line items sold and each line items
>> are
>> >> associated with other models like payments, shipments etc which is a
>> >> complex set of relations.
>> >>
>> >> We do not yet have a sophisticated reporting mechanism but was working
>> >>
>> on
>> >> building a simplistic reporting system on Django. But I am finding
>> issues
>> >> with scaling up - as reported with CPU Usage and the amount of time
>> >> taken.
>> >> If there is a way to optimise this - would be great otherwise we might
>> >>
>> >> not
>> >> have to look for standard methods of reporting tools.
>> >>
>> >> Would appreciate suggestions/advices on the above.
>> >>
>> >> Thanks,
>> >>
>> >> On Friday, March 10, 2017 at 2:52:50 PM UTC+5:30, James Schneider
>> wrote:
>> >>>
>> >>>
>> >>>
>> >>> On Mar 9, 2017 9:37 PM, "Web Architect" <[hidden email]> wrote:
>> >>>
>> >>> Would like to further add - the python CPU Usage is hitting almost 100
>> >>>
>> >>> %.
>> >>> When I run  a Select * query on Mysql, its quite fast and CPU is
>> normal.
>> >>> I
>> >>> am not sure if anything more needs to be done in Django.
>> >>>
>> >>>
>> >>> Ironically, things being done in Django is the reason for your CPU
>> >>> utilization issue in the first place.
>> >>>
>> >>> Calling a qs.all() is NOT the same as a SELECT * statement, even more
>> >>>
>> so
>> >>> when speaking to the scale of query that you mention.
>> >>>
>> >>> Your SQL query is simply listing data in a table. A very easy thing to
>> >>>
>> >>> do, hence the reason it runs quickly.
>> >>>
>> >>> The qs.all() call is also running the same query (probably). However,
>> >>>
>> in
>> >>> addition to pulling all of the data, it is performing a transformation
>> >>>
>> >>> of
>> >>> that data in to Django model objects. If you are pulling 10K items,
>> then
>> >>> Django is creating 10K objects, which is easily more intensive than a
>> >>>
>> >>> raw
>> >>> SQL query, even for simple model objects.
>> >>>
>> >>> In general, there's usually no practical reason to ever pull that many
>> >>>
>> >>> objects from a DB for display on a page. Filter down to a reasonable
>> >>> number
>> >>> (<100 for almost all sane cases) or implement a paging system to limit
>> >>>
>> >>> returned results. It's also probably using a ton of RAM only to be
>> >>> immediately thrown away at the end of the request. Browsers will
>> >>> disintegrate trying to render that many HTML elements simultaneously.
>> >>>
>> >>>
>> >>> Look at implementing a paging system, possibly through Django's
>> built-in
>> >>> mechanism, or something like Datatables and the infinite scroll
>> plugin.
>> >>>
>> >>> https://docs.djangoproject.com/en/dev/topics/pagination/
>> >>>
>> >>> https://datatables.net/extensions/scroller/
>> >>>
>> >>> -James
>> >>>
>> >> --
>> >> You received this message because you are subscribed to the Google
>> Groups
>> >> "Django users" group.
>> >> To unsubscribe from this group and stop receiving emails from it, send
>> >>
>> an
>> >> email to [hidden email] <javascript:>.
>> >> To post to this group, send email to [hidden email]
>> <javascript:>.
>> >> Visit this group at https://groups.google.com/group/django-users.
>> >> To view this discussion on the web visit https://groups.google.com/d/
>> >> msgid/django-users/566cf05e-babf-456c-91fa-a698f7c7537d%
>> 40googlegroups.com
>> >> <
>> https://groups.google.com/d/msgid/django-users/566cf05e-babf-456c-91fa-a698f7c7537d%40googlegroups.com?utm_medium=email&utm_source=footer>
>>
>>
>> >> .
>> >>
>> >> For more options, visit https://groups.google.com/d/optout.
>> >>
>> >
>> > --
>> > You received this message because you are subscribed to the Google
>> Groups
>> > "Django users" group.
>> > To unsubscribe from this group and stop receiving emails from it, send
>> an
>> > email to [hidden email] <javascript:>.
>> > To post to this group, send email to [hidden email]
>> <javascript:>.
>> > Visit this group at https://groups.google.com/group/django-users.
>> > To view this discussion on the web visit
>> >
>> https://groups.google.com/d/msgid/django-users/CAL13Cg9k7DhTKEgYi%3DB59NLZck2oP6gzu_Bjckz%3D_wqYwYA%3D3Q%40mail.gmail.com.
>>
>>
>> > For more options, visit https://groups.google.com/d/optout.
>> >
>>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Django users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [hidden email].
> To post to this group, send email to [hidden email].
> Visit this group at https://groups.google.com/group/django-users.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-users/165b08da-cc22-4dcb-90a8-6b35d02ac73f%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/CALn3ei3L0qc11vo6m0OJKA43UGc%3D0gvzs4fz7naRhV45X54kJg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Django queryset High CPU Usage

Camilo Torres
In reply to this post by Web Architect
Hi.

You can probably use another web framework for that report, or even rethink your architecture and put the report creation outside Django and the web server without Django ORM.

You may be interested in evaluating Celery and Django Celery. You can create a separate task outside Django to create such report, call the task from within Django with Celery, create the report and email to the user.

Regards.

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/3828e090-9f99-4a34-a6eb-aef2a9386e8e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Django queryset High CPU Usage

Daniel Hepper-2
In reply to this post by Web Architect
In additions to the suggestions you already received from others, have a look at django-import-export. It allows you to easily export data in various formats.

Hope that helps,
Daniel Hepper
https://consideratecode.com

On Friday, March 10, 2017 at 12:06:13 PM UTC+1, Web Architect wrote:
Hi James,

Thanks for your response. Melvyn also posed a similar point of not loading the whole records. 

But all the records are needed for reporting purposes - where the data is read from the DB and a csv report is created. I am not quite an expert on Django but I am not sure if there is a better way to do it. 

The scenario is as follows to make it clearer:

Ours is an ecommerce site built on Django. Our admin/accounting team needs to download reports now and then. We have a Django model for the line items purchased. Now there could be 10k line items sold and each line items are associated with other models like payments, shipments etc which is a complex set of relations. 

We do not yet have a sophisticated reporting mechanism but was working on building a simplistic reporting system on Django. But I am finding issues with scaling up - as reported with CPU Usage and the amount of time taken. If there is a way to optimise this - would be great otherwise we might not have to look for standard methods of reporting tools. 

Would appreciate suggestions/advices on the above.

Thanks,

On Friday, March 10, 2017 at 2:52:50 PM UTC+5:30, James Schneider wrote:


On Mar 9, 2017 9:37 PM, "Web Architect" <[hidden email]> wrote:
Would like to further add - the python CPU Usage is hitting almost 100 %. When I run  a Select * query on Mysql, its quite fast and CPU is normal. I am not sure if anything more needs to be done in Django. 

Ironically, things being done in Django is the reason for your CPU utilization issue in the first place.

Calling a qs.all() is NOT the same as a SELECT * statement, even more so when speaking to the scale of query that you mention.

Your SQL query is simply listing data in a table. A very easy thing to do, hence the reason it runs quickly.

The qs.all() call is also running the same query (probably). However, in addition to pulling all of the data, it is performing a transformation of that data in to Django model objects. If you are pulling 10K items, then Django is creating 10K objects, which is easily more intensive than a raw SQL query, even for simple model objects. 

In general, there's usually no practical reason to ever pull that many objects from a DB for display on a page. Filter down to a reasonable number (<100 for almost all sane cases) or implement a paging system to limit returned results. It's also probably using a ton of RAM only to be immediately thrown away at the end of the request. Browsers will disintegrate trying to render that many HTML elements simultaneously.

Look at implementing a paging system, possibly through Django's built-in mechanism, or something like Datatables and the infinite scroll plugin.

<a href="https://docs.djangoproject.com/en/dev/topics/pagination/" rel="nofollow" target="_blank" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fdocs.djangoproject.com%2Fen%2Fdev%2Ftopics%2Fpagination%2F\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEkN7hUA9IT158zJyNhum0xbywN3w&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fdocs.djangoproject.com%2Fen%2Fdev%2Ftopics%2Fpagination%2F\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEkN7hUA9IT158zJyNhum0xbywN3w&#39;;return true;">https://docs.djangoproject.com/en/dev/topics/pagination/

<a href="https://datatables.net/extensions/scroller/" rel="nofollow" target="_blank" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fdatatables.net%2Fextensions%2Fscroller%2F\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHSgfySjLvJN76tc2nme7HWNn5gQw&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fdatatables.net%2Fextensions%2Fscroller%2F\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHSgfySjLvJN76tc2nme7HWNn5gQw&#39;;return true;">https://datatables.net/extensions/scroller/

-James

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/90218749-c5e1-4ca8-ab6c-3e191a79798f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Django queryset High CPU Usage

Vijay Khemlani
Am I the only one who thinks that generating a report over a set of
just 10.000 records could be done in 10 - 20 secs unless there are
some serious computations going on with that data?

For a report I have to query around 200.000 records, with
aggregations, and it takes less than a minute using the ORM.

On 3/11/17, Daniel Hepper <[hidden email]> wrote:

> In additions to the suggestions you already received from others, have a
> look at django-import-export. It allows you to easily export data in
> various formats.
>
> Hope that helps,
> Daniel Hepper
> https://consideratecode.com
>
> On Friday, March 10, 2017 at 12:06:13 PM UTC+1, Web Architect wrote:
>>
>> Hi James,
>>
>> Thanks for your response. Melvyn also posed a similar point of not loading
>>
>> the whole records.
>>
>> But all the records are needed for reporting purposes - where the data is
>>
>> read from the DB and a csv report is created. I am not quite an expert on
>>
>> Django but I am not sure if there is a better way to do it.
>>
>> The scenario is as follows to make it clearer:
>>
>> Ours is an ecommerce site built on Django. Our admin/accounting team needs
>>
>> to download reports now and then. We have a Django model for the line
>> items
>> purchased. Now there could be 10k line items sold and each line items are
>>
>> associated with other models like payments, shipments etc which is a
>> complex set of relations.
>>
>> We do not yet have a sophisticated reporting mechanism but was working on
>>
>> building a simplistic reporting system on Django. But I am finding issues
>>
>> with scaling up - as reported with CPU Usage and the amount of time taken.
>>
>> If there is a way to optimise this - would be great otherwise we might not
>>
>> have to look for standard methods of reporting tools.
>>
>> Would appreciate suggestions/advices on the above.
>>
>> Thanks,
>>
>> On Friday, March 10, 2017 at 2:52:50 PM UTC+5:30, James Schneider wrote:
>>>
>>>
>>>
>>> On Mar 9, 2017 9:37 PM, "Web Architect" <[hidden email]> wrote:
>>>
>>> Would like to further add - the python CPU Usage is hitting almost 100 %.
>>>
>>> When I run  a Select * query on Mysql, its quite fast and CPU is normal.
>>> I
>>> am not sure if anything more needs to be done in Django.
>>>
>>>
>>> Ironically, things being done in Django is the reason for your CPU
>>> utilization issue in the first place.
>>>
>>> Calling a qs.all() is NOT the same as a SELECT * statement, even more so
>>>
>>> when speaking to the scale of query that you mention.
>>>
>>> Your SQL query is simply listing data in a table. A very easy thing to
>>> do, hence the reason it runs quickly.
>>>
>>> The qs.all() call is also running the same query (probably). However, in
>>>
>>> addition to pulling all of the data, it is performing a transformation of
>>>
>>> that data in to Django model objects. If you are pulling 10K items, then
>>>
>>> Django is creating 10K objects, which is easily more intensive than a raw
>>>
>>> SQL query, even for simple model objects.
>>>
>>> In general, there's usually no practical reason to ever pull that many
>>> objects from a DB for display on a page. Filter down to a reasonable
>>> number
>>> (<100 for almost all sane cases) or implement a paging system to limit
>>> returned results. It's also probably using a ton of RAM only to be
>>> immediately thrown away at the end of the request. Browsers will
>>> disintegrate trying to render that many HTML elements simultaneously.
>>>
>>> Look at implementing a paging system, possibly through Django's built-in
>>>
>>> mechanism, or something like Datatables and the infinite scroll plugin.
>>>
>>> https://docs.djangoproject.com/en/dev/topics/pagination/
>>>
>>> https://datatables.net/extensions/scroller/
>>>
>>> -James
>>>
>>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Django users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [hidden email].
> To post to this group, send email to [hidden email].
> Visit this group at https://groups.google.com/group/django-users.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-users/90218749-c5e1-4ca8-ab6c-3e191a79798f%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/CALn3ei03dwabCie0J6xMKxynDB%3Dr%3D%2Byzty-Vf1arnU-Z5HbfAA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Django queryset High CPU Usage

James Schneider


On Mar 11, 2017 12:01 PM, "Vijay Khemlani" <[hidden email]> wrote:
Am I the only one who thinks that generating a report over a set of
just 10.000 records could be done in 10 - 20 secs unless there are
some serious computations going on with that data?

For a report I have to query around 200.000 records, with
aggregations, and it takes less than a minute using the ORM.

The OP never mentioned a time interval that I can find in this thread, only CPU utilization. I can only imagine that the query is taking long enough to notice the CPU utilization, which would be at least a few seconds.

Querying and aggregating 200K records within the DB is not comparable to pulling 10K individual records and performing processing on each one. An ORM call with aggregation will perform a large majority of the work in the DB, and the ORM simply wraps the response accordingly. 

-James

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/CA%2Be%2BciWZFoHQD%3D9UpSQzmpzO70_7MXuw6J01myYrAQ4ZN-uX4g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Django queryset High CPU Usage

Vijay Khemlani
"But the CPU usage and time taken are high" <- I'm assuming high
enough to be problematic for OP.

I'm seriously not following. Why are people suggesting reporting and
export software when OP hasn't even described the problem in detail.
It's not even clear whether the high cpu and time taken are due to the
basic query ("Model.objects.all()") or the further processing of the
report.

It could easily be a missing "select_related" which causes thousands
of joins inside a for loop.

On 3/11/17, James Schneider <[hidden email]> wrote:

> On Mar 11, 2017 12:01 PM, "Vijay Khemlani" <[hidden email]> wrote:
>
> Am I the only one who thinks that generating a report over a set of
> just 10.000 records could be done in 10 - 20 secs unless there are
> some serious computations going on with that data?
>
> For a report I have to query around 200.000 records, with
> aggregations, and it takes less than a minute using the ORM.
>
>
> The OP never mentioned a time interval that I can find in this thread, only
> CPU utilization. I can only imagine that the query is taking long enough to
> notice the CPU utilization, which would be at least a few seconds.
>
> Querying and aggregating 200K records within the DB is not comparable to
> pulling 10K individual records and performing processing on each one. An
> ORM call with aggregation will perform a large majority of the work in the
> DB, and the ORM simply wraps the response accordingly.
>
> -James
>
> --
> You received this message because you are subscribed to the Google Groups
> "Django users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [hidden email].
> To post to this group, send email to [hidden email].
> Visit this group at https://groups.google.com/group/django-users.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-users/CA%2Be%2BciWZFoHQD%3D9UpSQzmpzO70_7MXuw6J01myYrAQ4ZN-uX4g%40mail.gmail.com.
> For more options, visit https://groups.google.com/d/optout.
>

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/CALn3ei3kzzz7XcN4m8DzBO5cAyr9PNhCsw4C_6aRCdk4iVYj7g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Django queryset High CPU Usage

Melvyn Sopacua

On Saturday 11 March 2017 21:29:10 Vijay Khemlani wrote:

> "But the CPU usage and time taken are high" <- I'm assuming high

> enough to be problematic for OP.

>

> I'm seriously not following. Why are people suggesting reporting and

> export software when OP hasn't even described the problem in detail.

 

Several reasons. Some chime in without reading the entire thread (cause OP already stated he's using Celery and displaying a notice when report is done). Another is that, as you've said, there's not enough detail to get to the root cause, but you gotta start somewhere.

 

Another is that the question is two-fold:

1) What causes CPU to spike

2) How can I scale this better

 

From personal experience, I find that displaying a notice "job is scheduled. It's estimated to be done at h:m and you will be notified by email" instills a calmness on project owners, that would otherwise make them adjust project requirements, cause they're biting their nails looking at a progress bar.

 

> It's not even clear whether the high cpu and time taken are due to the

> basic query ("Model.objects.all()") or the further processing of the

> report.

 

Agreed. But none the less, offloading to other hardware frees up webserver resources and scales better.

In fact, I would stop investigating if there's other things to finish, get the budget for the second machine and pick this up again in final stages to see if there's something that can be done.

 

> It could easily be a missing "select_related" which causes thousands

> of joins inside a for loop.

 

Good one. Takes a few seconds to check if not move on.

 

--

Melvyn Sopacua

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/1678874.4jI55StGvn%40devstation.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Django queryset High CPU Usage

Web Architect
In reply to this post by Vijay Khemlani
Hi Vjiay,

My apologies if the scenario is not clear. Following are the details:

Lets say there is a Model A (with fields, foreignkeys and ManyToMany relationships with other models). There are 10k records for A. Lets say following is the pseudo code for the report:

As = A.objects.all()

for a in As:
     retrieve other related data from associated models.
     Write Data in a csv report file

There's no CPU intensive work above - its just fetching data. I had used select_related and prefetch_related to reduce DB queries - otherwise MySQL CPU usage was going up.

The above was run in Celery as separate task. The Python CPU was hitting almost 100% and it was taking time to generate the report - more than 300s.

To debug the issue, I broke the above code and made it simple to narrow down the issue.

So, I just did the following:

As = A.objects.all()

if As:
     print "hello"

In the above, the CPU was hitting almost 100% and was taking almost a second or  more before Hello was printed. I also did select_related and prefetch_related to check further.

Hence, the conclusion that the query was creating the CPU spike. 

Hope I am clear.

Thanks,


On Sunday, March 12, 2017 at 6:00:00 AM UTC+5:30, Vijay Khemlani wrote:
"But the CPU usage and time taken are high" <- I'm assuming high
enough to be problematic for OP.

I'm seriously not following. Why are people suggesting reporting and
export software when OP hasn't even described the problem in detail.
It's not even clear whether the high cpu and time taken are due to the
basic query ("Model.objects.all()") or the further processing of the
report.

It could easily be a missing "select_related" which causes thousands
of joins inside a for loop.

On 3/11/17, James Schneider <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="nXLewRJgAwAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">jrschn...@...> wrote:

> On Mar 11, 2017 12:01 PM, "Vijay Khemlani" <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="nXLewRJgAwAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">vkhe...@...> wrote:
>
> Am I the only one who thinks that generating a report over a set of
> just 10.000 records could be done in 10 - 20 secs unless there are
> some serious computations going on with that data?
>
> For a report I have to query around 200.000 records, with
> aggregations, and it takes less than a minute using the ORM.
>
>
> The OP never mentioned a time interval that I can find in this thread, only
> CPU utilization. I can only imagine that the query is taking long enough to
> notice the CPU utilization, which would be at least a few seconds.
>
> Querying and aggregating 200K records within the DB is not comparable to
> pulling 10K individual records and performing processing on each one. An
> ORM call with aggregation will perform a large majority of the work in the
> DB, and the ORM simply wraps the response accordingly.
>
> -James
>
> --
> You received this message because you are subscribed to the Google Groups
> "Django users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="nXLewRJgAwAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">django-users...@googlegroups.com.
> To post to this group, send email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="nXLewRJgAwAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">django...@....
> Visit this group at <a href="https://groups.google.com/group/django-users" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://groups.google.com/group/django-users&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/group/django-users&#39;;return true;">https://groups.google.com/group/django-users.
> To view this discussion on the web visit
> <a href="https://groups.google.com/d/msgid/django-users/CA%2Be%2BciWZFoHQD%3D9UpSQzmpzO70_7MXuw6J01myYrAQ4ZN-uX4g%40mail.gmail.com" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://groups.google.com/d/msgid/django-users/CA%2Be%2BciWZFoHQD%3D9UpSQzmpzO70_7MXuw6J01myYrAQ4ZN-uX4g%40mail.gmail.com&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/msgid/django-users/CA%2Be%2BciWZFoHQD%3D9UpSQzmpzO70_7MXuw6J01myYrAQ4ZN-uX4g%40mail.gmail.com&#39;;return true;">https://groups.google.com/d/msgid/django-users/CA%2Be%2BciWZFoHQD%3D9UpSQzmpzO70_7MXuw6J01myYrAQ4ZN-uX4g%40mail.gmail.com.
> For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;">https://groups.google.com/d/optout.
>

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/4f3b29f5-b800-4bbc-bdb4-9b3e41ad2656%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Django queryset High CPU Usage

Web Architect
In reply to this post by James Schneider


On Sunday, March 12, 2017 at 4:53:25 AM UTC+5:30, James Schneider wrote:


On Mar 11, 2017 12:01 PM, "Vijay Khemlani" <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="cuQ10HBcAwAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">vkhe...@...> wrote:
Am I the only one who thinks that generating a report over a set of
just 10.000 records could be done in 10 - 20 secs unless there are
some serious computations going on with that data?

For a report I have to query around 200.000 records, with
aggregations, and it takes less than a minute using the ORM.

The OP never mentioned a time interval that I can find in this thread, only CPU utilization. I can only imagine that the query is taking long enough to notice the CPU utilization, which would be at least a few seconds.

Querying and aggregating 200K records within the DB is not comparable to pulling 10K individual records and performing processing on each one. An ORM call with aggregation will perform a large majority of the work in the DB, and the ORM simply wraps the response accordingly. 

I agree. The task was mostly to pull data from DB and generate the CSV. The  model structure was as follows:

Main Model A

A has foreignkey to model D
A has foriegnkey to model E
F has foreignkey to model D

Model B having manyTomany relationship with A through B1
Model C having manyTomany relationship with A through C1

It was mostly DB fetches which was the complex part. Something like fetch the latest of B related to each record of A. Fetch the latest of F related to the record of D which is related to each record of A. 

There were no CPU intensive calculations done. Though the overall DB fetches were complex, as mentioned in my response to Vijay, I just broke down the issue and tried the single queryset with select_related and prefetch_related and also "defer" and "only" to make the query lighter. But weren't helping. In fact the prefetch related on B and C were making the overall query heavier and hence, increasing the response time further.

Please note the above example was just the subset of the overall DB relations and models.


-James

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/b169cc33-8120-441c-847d-6977d0e0712a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
12
Loading...