GSoC10 Idea: Additional queryset methods

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

GSoC10 Idea: Additional queryset methods

Lucas van Dijk
## The Django ORM

### The Problem

Django is a awesome webframework, and works quite well. But after
using it for about a year now, I've came across some annoyances. One
great thing of django is the amount of third party reusable apps,
unfortunately, this comes at a cost in the number of database queries.
For example, the comments application, which has a template tag
'get_comment_count', this tag executes a query everytime when it gets
called. Or even better, the django-avatar app which executes 2 SQL
queries each time the {% avatar %} is called. This is not really
efficient as it often possible to select the required data with
joins.

With a custom Manager class, you can quite easily annotate the count
of an related object, but unfortunately not for generic relations.
When you want to add the comment count to, for example, a list of news
items, without executing a query for each news item, you'll have to
fallback to the 'extra' method of a queryset, which often will contain
raw SQL.

### The Solution

I want to refactor the QuerySet and other databases related API's,
which adds an additional method of selecting records, and provides
more control of what data you'll select, but probably is a bit more
complex than the current API. It's a bit inspired by Doctrine, a PHP
ORM, and yes, one of the few decent PHP projects.

Example 1.1:
    qs = QuerySet().select(News.all, User.username,
            Count('c__id'), comment_average=Avg(OtherModel.field)
        ).from(News) \
        .leftJoin('comments c')
        .filter(news__title="value") \
        .order_by('comment_average')
(This is just a not-working dummy example)

As you can see you can explicitly set the fields and tables you want
to select. Beside that the existing queryset methods will remain to
work. There'll be a slight difference though: instead of the related
name in filter/order_by/other methods you don't use the related name,
but the table name specified in the from method.

The select method supports selecting fields by string and python
object, the same for the from method. In addition, the select method
also supports the builtin django aggregates. Renaming selected fields
works the same as the aggregate method, just use python's named
parameters.

It also provides a simple method for joins, simply put the name of the
table you want to join, or the model class, and it will automaticly
generate the ON statement, when the two models are related, but is
overrideable with the second parameter of the leftJoin method.

--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To post to this group, send email to [hidden email].
To unsubscribe from this group, send email to [hidden email].
For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.

Reply | Threaded
Open this post in threaded view
|

Re: GSoC10 Idea: Additional queryset methods

Gert Van Gool
What would be the difference with the new raw method on manager
(http://docs.djangoproject.com/en/dev/topics/db/sql/#django.db.models.Manager.raw)?

-- Gert

Mobile: +32 498725202
Web: http://gert.selentic.net



On Fri, Mar 19, 2010 at 13:39, Sh4wn <[hidden email]> wrote:

> ## The Django ORM
>
> ### The Problem
>
> Django is a awesome webframework, and works quite well. But after
> using it for about a year now, I've came across some annoyances. One
> great thing of django is the amount of third party reusable apps,
> unfortunately, this comes at a cost in the number of database queries.
> For example, the comments application, which has a template tag
> 'get_comment_count', this tag executes a query everytime when it gets
> called. Or even better, the django-avatar app which executes 2 SQL
> queries each time the {% avatar %} is called. This is not really
> efficient as it often possible to select the required data with
> joins.
>
> With a custom Manager class, you can quite easily annotate the count
> of an related object, but unfortunately not for generic relations.
> When you want to add the comment count to, for example, a list of news
> items, without executing a query for each news item, you'll have to
> fallback to the 'extra' method of a queryset, which often will contain
> raw SQL.
>
> ### The Solution
>
> I want to refactor the QuerySet and other databases related API's,
> which adds an additional method of selecting records, and provides
> more control of what data you'll select, but probably is a bit more
> complex than the current API. It's a bit inspired by Doctrine, a PHP
> ORM, and yes, one of the few decent PHP projects.
>
> Example 1.1:
>    qs = QuerySet().select(News.all, User.username,
>            Count('c__id'), comment_average=Avg(OtherModel.field)
>        ).from(News) \
>        .leftJoin('comments c')
>        .filter(news__title="value") \
>        .order_by('comment_average')
> (This is just a not-working dummy example)
>
> As you can see you can explicitly set the fields and tables you want
> to select. Beside that the existing queryset methods will remain to
> work. There'll be a slight difference though: instead of the related
> name in filter/order_by/other methods you don't use the related name,
> but the table name specified in the from method.
>
> The select method supports selecting fields by string and python
> object, the same for the from method. In addition, the select method
> also supports the builtin django aggregates. Renaming selected fields
> works the same as the aggregate method, just use python's named
> parameters.
>
> It also provides a simple method for joins, simply put the name of the
> table you want to join, or the model class, and it will automaticly
> generate the ON statement, when the two models are related, but is
> overrideable with the second parameter of the leftJoin method.
>
> --
> You received this message because you are subscribed to the Google Groups "Django developers" group.
> To post to this group, send email to [hidden email].
> To unsubscribe from this group, send email to [hidden email].
> For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
>
>

--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To post to this group, send email to [hidden email].
To unsubscribe from this group, send email to [hidden email].
For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.

Reply | Threaded
Open this post in threaded view
|

Re: GSoC10 Idea: Additional queryset methods

Jacob Kaplan-Moss-2
In reply to this post by Lucas van Dijk
On Fri, Mar 19, 2010 at 7:39 AM, Sh4wn <[hidden email]> wrote:
> I want to refactor the QuerySet and other databases related API's,
> which adds an additional method of selecting records, and provides
> more control of what data you'll select, but probably is a bit more
> complex than the current API.

I don't want to discourage you, but...

Wait.

Actually, I *do* want to discourage you. The chance of this getting
accepted is going to be very, very, very low. If you're going to
change Django's query syntax, you're going to need to provide an
incredibly convincing argument, and you're going to need to explain
why this is so important, so vital, that it's worth breaking the
millions of lines of Django code already out there. We take backwards
compatibility incredibly seriously, so any discussion of changing the
model layer needs to start there.

So fair warning: you're going to be fighting an uphill battle on this one.

Jacob

--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To post to this group, send email to [hidden email].
To unsubscribe from this group, send email to [hidden email].
For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.

Reply | Threaded
Open this post in threaded view
|

Re: GSoC10 Idea: Additional queryset methods

Karen Tracey-2
On Fri, Mar 19, 2010 at 10:47 AM, Jacob Kaplan-Moss <[hidden email]> wrote:
On Fri, Mar 19, 2010 at 7:39 AM, Sh4wn <[hidden email]> wrote:
> I want to refactor the QuerySet and other databases related API's,
> which adds an additional method of selecting records, and provides
> more control of what data you'll select, but probably is a bit more
> complex than the current API.

I don't want to discourage you, but...

Wait.

Actually, I *do* want to discourage you. [snip backwards compatibility concerns]

It wasn't immediately clear to me that this proposal involved breaking backwards compatibility -- it sounded like it might just involve adding methods and not necessarily changing the behavior of existing ones.

However, it does seem to be in direct conflict with the idea of getting to the ORM working on non-SQL databases. The additional methods mentioned (select, from, leftJoin) together all sound like they are moving the app-ORM interface closer to raw SQL rather than away from it. Given the more long-standing interest in getting the ORM working with things like app engine's BigTable, I can't see a proposal for movement more towards raw SQL being accepted.

Karen

--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To post to this group, send email to [hidden email].
To unsubscribe from this group, send email to [hidden email].
For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
Reply | Threaded
Open this post in threaded view
|

Re: GSoC10 Idea: Additional queryset methods

Taylor Marshall
On Fri, Mar 19, 2010 at 11:09 AM, Karen Tracey <[hidden email]> wrote:
> However, it does seem to be in direct conflict with the idea of getting to
> the ORM working on non-SQL databases. The additional methods mentioned
> (select, from, leftJoin) together all sound like they are moving the app-ORM
> interface closer to raw SQL rather than away from it. Given the more
> long-standing interest in getting the ORM working with things like app
> engine's BigTable, I can't see a proposal for movement more towards raw SQL
> being accepted.

I agree with this sentiment.  To me, the example API shown here reads
almost just like SQL.   There's already a mechanism for writing raw
SQL, and the example provided doesn't seem to be a whole lot simpler.

Maybe a solution is some extension to django.db.connection that would
let you get back objects with named fields (or maybe just dicts) in
place of plain tuples?   You'd get all the flexibility that you want,
but more readable code (accessing field names rather than sequence
indices).

Taylor

--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To post to this group, send email to [hidden email].
To unsubscribe from this group, send email to [hidden email].
For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.

Reply | Threaded
Open this post in threaded view
|

Re: GSoC10 Idea: Additional queryset methods

Lucas van Dijk
I will only add new things, and won't break any current API.

Why this is needed:
You can actually do *database independent* queries with full control.
The current API is IMO limited when trying to execute queries with
joins. Especially when you want to aggregate data from the joined
table. Yes, since django 1.1 you have the aggregate and annote
methods, but this doesn't work with generic foreign keys.
select_related only works with foreign keys, and doesn't support
aggregation.

When you do care about optimizing the number of queries of your
website, you'll always need fairly complex joins. Currently the only
way to create such join, is the QuerySet.query.join method. A method
which is used in a totally other way than the QuerySet API.

Didn't know about the same API is used for the app engine, but if the
current api works, there must be a way to let this work with it too.
Maybe the method names won't really match with what you're really
doing, it's up to you guys if that's a real problem or not.

Lucas

On 19 mrt, 23:01, Taylor Marshall <[hidden email]>
wrote:

> On Fri, Mar 19, 2010 at 11:09 AM, Karen Tracey <[hidden email]> wrote:
> > However, it does seem to be in direct conflict with the idea of getting to
> > the ORM working on non-SQL databases. The additional methods mentioned
> > (select, from, leftJoin) together all sound like they are moving the app-ORM
> > interface closer to raw SQL rather than away from it. Given the more
> > long-standing interest in getting the ORM working with things like app
> > engine's BigTable, I can't see a proposal for movement more towards raw SQL
> > being accepted.
>
> I agree with this sentiment.  To me, the example API shown here reads
> almost just like SQL.   There's already a mechanism for writing raw
> SQL, and the example provided doesn't seem to be a whole lot simpler.
>
> Maybe a solution is some extension to django.db.connection that would
> let you get back objects with named fields (or maybe just dicts) in
> place of plain tuples?   You'd get all the flexibility that you want,
> but more readable code (accessing field names rather than sequence
> indices).
>
> Taylor

--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To post to this group, send email to [hidden email].
To unsubscribe from this group, send email to [hidden email].
For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.

Reply | Threaded
Open this post in threaded view
|

Re: GSoC10 Idea: Additional queryset methods

Malcolm Tredinnick
In reply to this post by Lucas van Dijk
On Fri, 2010-03-19 at 05:39 -0700, Sh4wn wrote:
[..]

> I want to refactor the QuerySet and other databases related API's,
> which adds an additional method of selecting records, and provides
> more control of what data you'll select, but probably is a bit more
> complex than the current API. It's a bit inspired by Doctrine, a PHP
> ORM, and yes, one of the few decent PHP projects.
>
> Example 1.1:
>     qs = QuerySet().select(News.all, User.username,
>             Count('c__id'), comment_average=Avg(OtherModel.field)
>         ).from(News) \
>         .leftJoin('comments c')
>         .filter(news__title="value") \
>         .order_by('comment_average')
> (This is just a not-working dummy example)
>
> As you can see you can explicitly set the fields and tables you want
> to select. Beside that the existing queryset methods will remain to
> work. There'll be a slight difference though: instead of the related
> name in filter/order_by/other methods you don't use the related name,
> but the table name specified in the from method.

In other words, you want to write raw SQL, but in Python. Django
intentionally doesn't do this, focusing on functionality, rather than
describing the SQL that is created. This, for example, makes things
easier for using non-SQL storage and keeps the ORM "Pythonic", rather
than SQL-based. Any time you get to the level of having to think about
"left outer join" or database table names, you might as well just use
raw SQL.

Your motivational examples aren't particularly convincing as written,
since you don't explain *why* those tags are making multiple queries. I
will grant you that there are many Django applications around in public
and prviate use that could quite possibly make more efficient use of the
database. Sometimes this is because the maintainers have missed an
alternate approach, sometimes because the particular tag or method isn't
used enough to warrant extra optimisations just yet. However, I don't
see many of these cases as being symptomatic of a failure in the ORM
that requires a more complex API.

(By the way, adding extra methods too QuerySets are easy, since you can
subclass the Queryset class and use that instead. For anything that
would end up trying to justify the sort of step backwards -- which is
what you're proposing here, as you want to leak more about the raw
database layer into the Python API -- you would want to start by showing
how these are unavoidably useful via a set of subclasses that are being
used in various applications.)

Regards,
Malcolm

--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To post to this group, send email to [hidden email].
To unsubscribe from this group, send email to [hidden email].
For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.