Instance Based Management?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Instance Based Management?

silverstrings026

Hello, I was wondering about instance based management. If I'm wrong, please tell me.

When we have users and user generated content in a large database, query times are increased significantly. Why is there no instance based manager (like the models.Manager()) that basically generates a table for each user and queries ONLY that table? Would that not just flatten the database instead of increasing it's size? For example, if we have 1,000,000 users all of which generate at least 10 posts per day and one of the users only generates 5 in the span of 10 days, unless we have a many to many field or something to hold those five posts, the query time to find their posts would be ridiculous.

So if we have a table generated for each user that holds arbitrary connections to anything they generate, it would in theory cut query times significantly. Why is there no feature like this? Again, if I'm wrong please tell me but the amount of tables doesn't matter and instead the data they hold does so, in my understanding, 1,000,000,000 posts will always be the size of 1,000,000,000 posts no matter their organization.

I've got ideas on implementation and even asyncronous supports as well as customization but I have no idea how to bring this up to the django developers and I'm not even sure it would work (though, no matter how hard I try, I can't see anything wrong with it).

Let me know your input and if there's a way I can ask the django devs about this and possibly even suggest a few things pertaining to it. I'd like to help make django the best it can be and if this works and we can implement it, django will be very fast with user generated content.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/7e36ded7-2f3d-43c2-881c-cbc75c80b5c2n%40googlegroups.com.
Reply | Threaded
Open this post in threaded view
|

Re: Instance Based Management?

Daryl
What you are describing sounds like denormalisation, but the first step would be to ensure that the indexes on the child tables are being created. 

The query time to find rows if the lookup field is indexed is close to linear, without an index it will be close to exponential.
You should only denormalise after you have investigated the root cause of your performance issues. Most times, denormalization is a form of premature optimization [ https://stackify.com/premature-optimization-evil/ ]
If you are talking about virtual denormalisation (ie a view or table generated by the manager you mention) then this relies again on indexing of the underlying data to work, so you get no performance gain.
Correct me if I'm wrong, but AFAIK this is true of most managers - they help you with your logical view of the data, not the performance of retrieving it.

There are many django deployments where the performance is fine with orders of magnitude more data than you are referring to, it just takes careful planning.

D


On Tue, 27 Oct 2020 at 13:40, Matthew Amstutz <[hidden email]> wrote:

Hello, I was wondering about instance based management. If I'm wrong, please tell me.

When we have users and user generated content in a large database, query times are increased significantly. Why is there no instance based manager (like the models.Manager()) that basically generates a table for each user and queries ONLY that table? Would that not just flatten the database instead of increasing it's size? For example, if we have 1,000,000 users all of which generate at least 10 posts per day and one of the users only generates 5 in the span of 10 days, unless we have a many to many field or something to hold those five posts, the query time to find their posts would be ridiculous.

So if we have a table generated for each user that holds arbitrary connections to anything they generate, it would in theory cut query times significantly. Why is there no feature like this? Again, if I'm wrong please tell me but the amount of tables doesn't matter and instead the data they hold does so, in my understanding, 1,000,000,000 posts will always be the size of 1,000,000,000 posts no matter their organization.

I've got ideas on implementation and even asyncronous supports as well as customization but I have no idea how to bring this up to the django developers and I'm not even sure it would work (though, no matter how hard I try, I can't see anything wrong with it).

Let me know your input and if there's a way I can ask the django devs about this and possibly even suggest a few things pertaining to it. I'd like to help make django the best it can be and if this works and we can implement it, django will be very fast with user generated content.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/7e36ded7-2f3d-43c2-881c-cbc75c80b5c2n%40googlegroups.com.


--
-- 
======================
Daryl Egarr,  Director
Kawhai Consultants Ltd
Cell       021 521 353
[hidden email]
======================

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CALzH9quw5tSNNuAe%2Bspt9edTcJWV_ckUqdiyEtW5Qybx9kFepQ%40mail.gmail.com.
Reply | Threaded
Open this post in threaded view
|

Re: Instance Based Management?

Jure Erznožnik-2
In reply to this post by silverstrings026

Hi Matthew,

I think you found the wrong mailing list for this question. Might I suggest you try [hidden email]? The question seems better suited there.

That said, I don't know why you wouldn't want to use foreign keys in this scenario, but Django does support a thing called content types for what you seem to be suggesting. There's a section on that page called "Generic relations".

Have a look.

LP,
Jure

On 27. 10. 20 01:21, Matthew Amstutz wrote:

Hello, I was wondering about instance based management. If I'm wrong, please tell me.

When we have users and user generated content in a large database, query times are increased significantly. Why is there no instance based manager (like the models.Manager()) that basically generates a table for each user and queries ONLY that table? Would that not just flatten the database instead of increasing it's size? For example, if we have 1,000,000 users all of which generate at least 10 posts per day and one of the users only generates 5 in the span of 10 days, unless we have a many to many field or something to hold those five posts, the query time to find their posts would be ridiculous.

So if we have a table generated for each user that holds arbitrary connections to anything they generate, it would in theory cut query times significantly. Why is there no feature like this? Again, if I'm wrong please tell me but the amount of tables doesn't matter and instead the data they hold does so, in my understanding, 1,000,000,000 posts will always be the size of 1,000,000,000 posts no matter their organization.

I've got ideas on implementation and even asyncronous supports as well as customization but I have no idea how to bring this up to the django developers and I'm not even sure it would work (though, no matter how hard I try, I can't see anything wrong with it).

Let me know your input and if there's a way I can ask the django devs about this and possibly even suggest a few things pertaining to it. I'd like to help make django the best it can be and if this works and we can implement it, django will be very fast with user generated content.
--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/7e36ded7-2f3d-43c2-881c-cbc75c80b5c2n%40googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/944bfa26-a0bf-69bb-f76a-c0654910eb20%40gmail.com.
Reply | Threaded
Open this post in threaded view
|

Re: Instance Based Management?

Tom Forbes
I think what Mathew really wants is support for table partitioning. You can get this right now with this library[1] for postgres. I’m not sure if this makes sense to add to core, however support is quite broad (MYSQL, MariaDB, Postgres and Oracle).

1. https://django-postgres-extra.readthedocs.io/en/master/table_partitioning.html

Tom

On 27 Oct 2020, at 05:30, Jure Erznožnik <[hidden email]> wrote:



Hi Matthew,

I think you found the wrong mailing list for this question. Might I suggest you try [hidden email]? The question seems better suited there.

That said, I don't know why you wouldn't want to use foreign keys in this scenario, but Django does support a thing called content types for what you seem to be suggesting. There's a section on that page called "Generic relations".

Have a look.

LP,
Jure

On 27. 10. 20 01:21, Matthew Amstutz wrote:

Hello, I was wondering about instance based management. If I'm wrong, please tell me.

When we have users and user generated content in a large database, query times are increased significantly. Why is there no instance based manager (like the models.Manager()) that basically generates a table for each user and queries ONLY that table? Would that not just flatten the database instead of increasing it's size? For example, if we have 1,000,000 users all of which generate at least 10 posts per day and one of the users only generates 5 in the span of 10 days, unless we have a many to many field or something to hold those five posts, the query time to find their posts would be ridiculous.

So if we have a table generated for each user that holds arbitrary connections to anything they generate, it would in theory cut query times significantly. Why is there no feature like this? Again, if I'm wrong please tell me but the amount of tables doesn't matter and instead the data they hold does so, in my understanding, 1,000,000,000 posts will always be the size of 1,000,000,000 posts no matter their organization.

I've got ideas on implementation and even asyncronous supports as well as customization but I have no idea how to bring this up to the django developers and I'm not even sure it would work (though, no matter how hard I try, I can't see anything wrong with it).

Let me know your input and if there's a way I can ask the django devs about this and possibly even suggest a few things pertaining to it. I'd like to help make django the best it can be and if this works and we can implement it, django will be very fast with user generated content.
--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/7e36ded7-2f3d-43c2-881c-cbc75c80b5c2n%40googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/944bfa26-a0bf-69bb-f76a-c0654910eb20%40gmail.com.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/BA1C3DF2-3BD1-473F-BA01-016ACA251D5D%40tomforb.es.
Reply | Threaded
Open this post in threaded view
|

Re: Instance Based Management?

silverstrings026
The idea was to automatically generate a table with auto generated fields that gets arbitrarily connected to any content that can be spawned by users....Then I realized this is pretty much an M2M field with extra steps but it wouldn't let me delete the post lol. But I didn't know about table partitioning so Thank you for the information. I know about ContentTypes and such, I was just thinking it may be possible to cut some of the work out for the developer so instead of explicitly declaring fields, they can just access the existing one. Like I said, I realized this was stupid but the group wouldn't let me delete it lol.  Thanks again for the information and I only posted on this group because I thought the idea could be put in the core just to cut some of the work out for developers.

Thanks again everyone, sorry for the ridiculously stupid post lol (I swear I'm not 100% new)

On Tuesday, October 27, 2020 at 7:14:08 AM UTC-4 [hidden email] wrote:
I think what Mathew really wants is support for table partitioning. You can get this right now with this library[1] for postgres. I’m not sure if this makes sense to add to core, however support is quite broad (MYSQL, MariaDB, Postgres and Oracle).

1. https://django-postgres-extra.readthedocs.io/en/master/table_partitioning.html

Tom

On 27 Oct 2020, at 05:30, Jure Erznožnik <[hidden email]> wrote:



Hi Matthew,

I think you found the wrong mailing list for this question. Might I suggest you try [hidden email]? The question seems better suited there.

That said, I don't know why you wouldn't want to use foreign keys in this scenario, but Django does support a thing called content types for what you seem to be suggesting. There's a section on that page called "Generic relations".

Have a look.

LP,
Jure

On 27. 10. 20 01:21, Matthew Amstutz wrote:

Hello, I was wondering about instance based management. If I'm wrong, please tell me.

When we have users and user generated content in a large database, query times are increased significantly. Why is there no instance based manager (like the models.Manager()) that basically generates a table for each user and queries ONLY that table? Would that not just flatten the database instead of increasing it's size? For example, if we have 1,000,000 users all of which generate at least 10 posts per day and one of the users only generates 5 in the span of 10 days, unless we have a many to many field or something to hold those five posts, the query time to find their posts would be ridiculous.

So if we have a table generated for each user that holds arbitrary connections to anything they generate, it would in theory cut query times significantly. Why is there no feature like this? Again, if I'm wrong please tell me but the amount of tables doesn't matter and instead the data they hold does so, in my understanding, 1,000,000,000 posts will always be the size of 1,000,000,000 posts no matter their organization.

I've got ideas on implementation and even asyncronous supports as well as customization but I have no idea how to bring this up to the django developers and I'm not even sure it would work (though, no matter how hard I try, I can't see anything wrong with it).

Let me know your input and if there's a way I can ask the django devs about this and possibly even suggest a few things pertaining to it. I'd like to help make django the best it can be and if this works and we can implement it, django will be very fast with user generated content.
--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/7e36ded7-2f3d-43c2-881c-cbc75c80b5c2n%40googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/6979a21e-4e6a-4ae7-9272-d735f81f0d03n%40googlegroups.com.
Reply | Threaded
Open this post in threaded view
|

Re: Instance Based Management?

Adam Johnson-2
Thanks again everyone, sorry for the ridiculously stupid post lol (I swear I'm not 100% new)

Self-insulting is not tolerated on this list! You're not stupid, nor was your post. It's an interesting problem to have, thank you for posting.

If indexes aren't working, partitioning does seem to be the way to reach high scale, and perhaps Django could do more to support it. If you try it out and spot anything, do let us know if you can spot any improvements to Django.



On Tue, 27 Oct 2020 at 14:57, Matthew Amstutz <[hidden email]> wrote:
The idea was to automatically generate a table with auto generated fields that gets arbitrarily connected to any content that can be spawned by users....Then I realized this is pretty much an M2M field with extra steps but it wouldn't let me delete the post lol. But I didn't know about table partitioning so Thank you for the information. I know about ContentTypes and such, I was just thinking it may be possible to cut some of the work out for the developer so instead of explicitly declaring fields, they can just access the existing one. Like I said, I realized this was stupid but the group wouldn't let me delete it lol.  Thanks again for the information and I only posted on this group because I thought the idea could be put in the core just to cut some of the work out for developers.

Thanks again everyone, sorry for the ridiculously stupid post lol (I swear I'm not 100% new)

On Tuesday, October 27, 2020 at 7:14:08 AM UTC-4 [hidden email] wrote:
I think what Mathew really wants is support for table partitioning. You can get this right now with this library[1] for postgres. I’m not sure if this makes sense to add to core, however support is quite broad (MYSQL, MariaDB, Postgres and Oracle).

1. https://django-postgres-extra.readthedocs.io/en/master/table_partitioning.html

Tom

On 27 Oct 2020, at 05:30, Jure Erznožnik <[hidden email]> wrote:



Hi Matthew,

I think you found the wrong mailing list for this question. Might I suggest you try [hidden email]? The question seems better suited there.

That said, I don't know why you wouldn't want to use foreign keys in this scenario, but Django does support a thing called content types for what you seem to be suggesting. There's a section on that page called "Generic relations".

Have a look.

LP,
Jure

On 27. 10. 20 01:21, Matthew Amstutz wrote:

Hello, I was wondering about instance based management. If I'm wrong, please tell me.

When we have users and user generated content in a large database, query times are increased significantly. Why is there no instance based manager (like the models.Manager()) that basically generates a table for each user and queries ONLY that table? Would that not just flatten the database instead of increasing it's size? For example, if we have 1,000,000 users all of which generate at least 10 posts per day and one of the users only generates 5 in the span of 10 days, unless we have a many to many field or something to hold those five posts, the query time to find their posts would be ridiculous.

So if we have a table generated for each user that holds arbitrary connections to anything they generate, it would in theory cut query times significantly. Why is there no feature like this? Again, if I'm wrong please tell me but the amount of tables doesn't matter and instead the data they hold does so, in my understanding, 1,000,000,000 posts will always be the size of 1,000,000,000 posts no matter their organization.

I've got ideas on implementation and even asyncronous supports as well as customization but I have no idea how to bring this up to the django developers and I'm not even sure it would work (though, no matter how hard I try, I can't see anything wrong with it).

Let me know your input and if there's a way I can ask the django devs about this and possibly even suggest a few things pertaining to it. I'd like to help make django the best it can be and if this works and we can implement it, django will be very fast with user generated content.
--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/7e36ded7-2f3d-43c2-881c-cbc75c80b5c2n%40googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/6979a21e-4e6a-4ae7-9272-d735f81f0d03n%40googlegroups.com.


--
Adam

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CAMyDDM3GL%2B7PF1S2wOFjGre%2BOYdrj_eBd9LTHhmndHC4bJO7yw%40mail.gmail.com.
Reply | Threaded
Open this post in threaded view
|

Re: Instance Based Management?

silverstrings026
Thanks Adam! I'll do some experimenting and see what I can come up with! Also, I'll stop insulting myself lol. I wasn't expecting such polite replies, everyone has been telling me that alot of developers are mean to newbies.

 I've been coding for 2 years and using django for a year and a half; I've got alot of ideas to try and make things easier for developers but experimentation is going to be the most important thing for it. I'll try and get some data before posting next time.

Thanks again!

On Tue, Oct 27, 2020, 11:11 AM Adam Johnson <[hidden email]> wrote:
Thanks again everyone, sorry for the ridiculously stupid post lol (I swear I'm not 100% new)

Self-insulting is not tolerated on this list! You're not stupid, nor was your post. It's an interesting problem to have, thank you for posting.

If indexes aren't working, partitioning does seem to be the way to reach high scale, and perhaps Django could do more to support it. If you try it out and spot anything, do let us know if you can spot any improvements to Django.



On Tue, 27 Oct 2020 at 14:57, Matthew Amstutz <[hidden email]> wrote:
The idea was to automatically generate a table with auto generated fields that gets arbitrarily connected to any content that can be spawned by users....Then I realized this is pretty much an M2M field with extra steps but it wouldn't let me delete the post lol. But I didn't know about table partitioning so Thank you for the information. I know about ContentTypes and such, I was just thinking it may be possible to cut some of the work out for the developer so instead of explicitly declaring fields, they can just access the existing one. Like I said, I realized this was stupid but the group wouldn't let me delete it lol.  Thanks again for the information and I only posted on this group because I thought the idea could be put in the core just to cut some of the work out for developers.

Thanks again everyone, sorry for the ridiculously stupid post lol (I swear I'm not 100% new)

On Tuesday, October 27, 2020 at 7:14:08 AM UTC-4 [hidden email] wrote:
I think what Mathew really wants is support for table partitioning. You can get this right now with this library[1] for postgres. I’m not sure if this makes sense to add to core, however support is quite broad (MYSQL, MariaDB, Postgres and Oracle).

1. https://django-postgres-extra.readthedocs.io/en/master/table_partitioning.html

Tom

On 27 Oct 2020, at 05:30, Jure Erznožnik <[hidden email]> wrote:



Hi Matthew,

I think you found the wrong mailing list for this question. Might I suggest you try [hidden email]? The question seems better suited there.

That said, I don't know why you wouldn't want to use foreign keys in this scenario, but Django does support a thing called content types for what you seem to be suggesting. There's a section on that page called "Generic relations".

Have a look.

LP,
Jure

On 27. 10. 20 01:21, Matthew Amstutz wrote:

Hello, I was wondering about instance based management. If I'm wrong, please tell me.

When we have users and user generated content in a large database, query times are increased significantly. Why is there no instance based manager (like the models.Manager()) that basically generates a table for each user and queries ONLY that table? Would that not just flatten the database instead of increasing it's size? For example, if we have 1,000,000 users all of which generate at least 10 posts per day and one of the users only generates 5 in the span of 10 days, unless we have a many to many field or something to hold those five posts, the query time to find their posts would be ridiculous.

So if we have a table generated for each user that holds arbitrary connections to anything they generate, it would in theory cut query times significantly. Why is there no feature like this? Again, if I'm wrong please tell me but the amount of tables doesn't matter and instead the data they hold does so, in my understanding, 1,000,000,000 posts will always be the size of 1,000,000,000 posts no matter their organization.

I've got ideas on implementation and even asyncronous supports as well as customization but I have no idea how to bring this up to the django developers and I'm not even sure it would work (though, no matter how hard I try, I can't see anything wrong with it).

Let me know your input and if there's a way I can ask the django devs about this and possibly even suggest a few things pertaining to it. I'd like to help make django the best it can be and if this works and we can implement it, django will be very fast with user generated content.
--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/7e36ded7-2f3d-43c2-881c-cbc75c80b5c2n%40googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/6979a21e-4e6a-4ae7-9272-d735f81f0d03n%40googlegroups.com.


--
Adam

--
You received this message because you are subscribed to a topic in the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/django-developers/vN1e5dDqPnc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CAMyDDM3GL%2B7PF1S2wOFjGre%2BOYdrj_eBd9LTHhmndHC4bJO7yw%40mail.gmail.com.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CAL_q77J3QG-AsuDCMGRZoDfQ7wHG9QQ-wr9ZRYCpmi9fCuGq9Q%40mail.gmail.com.