Add Alias or annotations without group-by support?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Add Alias or annotations without group-by support?

Cristiano Coelho
Hello, I'm having a hard time explaining the exact issue but I hope it's clear enough.


Following this issue (https://groups.google.com/forum/#!searchin/django-users/cristiano%7Csort:date/django-users/q6XdfyK29HA/TcE8oFitBQAJ) from django users and a related ticket (https://code.djangoproject.com/ticket/27719) that seems to be left out or forgotten already.

There has to be a way to alias or annotate a value given an expression or SQL Function that doesn't necessarily aggregates data but rather work on a single value.

Right now as shown on the django-users post, using annotate for this purpose will cause unexpected grouping and sub querying that could result in very slow and hard to debug queries.

The core issue is that using annotate without a previous call either vaues or values_list, will work as expected, simply annotating a value and returning it as an additional column, but if an aggregate is added afterwards (such as count), the final query ends up being a redundant query where the annotated value is added to a group by clause (group by id + column), to a column as part of the select (function called twice) and then wrapped into a select * (subquery), which makes the extra column as part of the select and group by useless, unless the query had any kind of left/inner join in which case the group by might make sense (although not sure about the column showing up on the select clause)

The ugly work around is to simply add a .values('id') at the end so the annotated value doesn't show on the group by and select sections, although the nested query still happens.


For this reason, there's currently no way to achieve the above without ugly work arounds or unnecessary database performance hits.

The easiest option I believe would be to follow the ticket in order to implement an alias call that works exactly like annotate but doesn't trigger any grouping.

A more complicated option is probably trying to make annotate/aggregate smarter, so all the unnecessary grouping and sub querying doesn't happen unless needed, for example, if the queryset didn't call values/values_list or if there are no relationships/joins used.


Example/demostration:

Given the following queryset

query1 = MyModel.objects.annotate(x=MyFunction('a', 'b')).filter(x__gte=0.6).order_by('-x')


query1 SQL is good and looks like:

SELECT id, a, b, myfunction(a, b) as x
FROM mymodel
WHERE myfunction
(a, b) >= 0.6
ORDER BY x desc

Notice how there's no group by, the ORM was smart enough to not include it since there was no previous call to values/values_list


If we run query1.count() the final SQL looks like:

SELECT COUNT(*) FROM (
    SELECT id
, myfunction(a, b) as x
    FROM mymodel
    WHERE myfunction
(a ,b) >= 0.6
    GROUP BY id
, myfunction(a ,b)
) subquery

which if myfunction is slow, will add a massive slow down that's not even needed, and should actually be just:

SELECT count(*)
FROM mymodel
WHERE myfunction
(a ,b) >= 0.6


while the other query should ONLY happen if the group by makes sense (i.e, if there's a join somewhere, or a values/values_list was used previously so id is not part of the group by statement)

but if we work around the issue adding a query1.values('id').count(), the final query ends up better:

SELECT COUNT(*) FROM (
    SELECT id
    FROM mymodel
    WHERE myfunction
(a ,b) >= 0.6
) subquery


I hope I could explain this clear enough with the example, and note that using a custom lookup is not possible since the value is required for the order_by to work.


--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/4e4dbcd9-9c49-468b-b633-ca27daf3fe69%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Add Alias or annotations without group-by support?

Jared Proffitt
I have also run into this exact problem. Would love to get this fixed. Have you found a good workaround?

On Tuesday, December 26, 2017 at 12:37:16 PM UTC-6, Cristiano Coelho wrote:
Hello, I'm having a hard time explaining the exact issue but I hope it's clear enough.


Following this issue (<a href="https://groups.google.com/forum/#!searchin/django-users/cristiano%7Csort:date/django-users/q6XdfyK29HA/TcE8oFitBQAJ" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://groups.google.com/forum/#!searchin/django-users/cristiano%7Csort:date/django-users/q6XdfyK29HA/TcE8oFitBQAJ&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/forum/#!searchin/django-users/cristiano%7Csort:date/django-users/q6XdfyK29HA/TcE8oFitBQAJ&#39;;return true;">https://groups.google.com/forum/#!searchin/django-users/cristiano%7Csort:date/django-users/q6XdfyK29HA/TcE8oFitBQAJ) from django users and a related ticket (<a href="https://code.djangoproject.com/ticket/27719" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fcode.djangoproject.com%2Fticket%2F27719\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHXLlcoILhMfwViibVvTLw-xXcgsw&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fcode.djangoproject.com%2Fticket%2F27719\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHXLlcoILhMfwViibVvTLw-xXcgsw&#39;;return true;">https://code.djangoproject.com/ticket/27719) that seems to be left out or forgotten already.

There has to be a way to alias or annotate a value given an expression or SQL Function that doesn't necessarily aggregates data but rather work on a single value.

Right now as shown on the django-users post, using annotate for this purpose will cause unexpected grouping and sub querying that could result in very slow and hard to debug queries.

The core issue is that using annotate without a previous call either vaues or values_list, will work as expected, simply annotating a value and returning it as an additional column, but if an aggregate is added afterwards (such as count), the final query ends up being a redundant query where the annotated value is added to a group by clause (group by id + column), to a column as part of the select (function called twice) and then wrapped into a select * (subquery), which makes the extra column as part of the select and group by useless, unless the query had any kind of left/inner join in which case the group by might make sense (although not sure about the column showing up on the select clause)

The ugly work around is to simply add a .values('id') at the end so the annotated value doesn't show on the group by and select sections, although the nested query still happens.


For this reason, there's currently no way to achieve the above without ugly work arounds or unnecessary database performance hits.

The easiest option I believe would be to follow the ticket in order to implement an alias call that works exactly like annotate but doesn't trigger any grouping.

A more complicated option is probably trying to make annotate/aggregate smarter, so all the unnecessary grouping and sub querying doesn't happen unless needed, for example, if the queryset didn't call values/values_list or if there are no relationships/joins used.


Example/demostration:

Given the following queryset

query1 = MyModel.objects.annotate(x=MyFunction('a', 'b')).filter(x__gte=0.6).order_by('-x')


query1 SQL is good and looks like:

SELECT id, a, b, myfunction(a, b) as x
FROM mymodel
WHERE myfunction
(a, b) >= 0.6
ORDER BY x desc

Notice how there's no group by, the ORM was smart enough to not include it since there was no previous call to values/values_list


If we run query1.count() the final SQL looks like:

SELECT COUNT(*) FROM (
    SELECT id
, myfunction(a, b) as x
    FROM mymodel
    WHERE myfunction
(a ,b) >= 0.6
    GROUP BY id
, myfunction(a ,b)
) subquery

which if myfunction is slow, will add a massive slow down that's not even needed, and should actually be just:

SELECT count(*)
FROM mymodel
WHERE myfunction
(a ,b) >= 0.6


while the other query should ONLY happen if the group by makes sense (i.e, if there's a join somewhere, or a values/values_list was used previously so id is not part of the group by statement)

but if we work around the issue adding a query1.values('id').count(), the final query ends up better:

SELECT COUNT(*) FROM (
    SELECT id
    FROM mymodel
    WHERE myfunction
(a ,b) >= 0.6
) subquery


I hope I could explain this clear enough with the example, and note that using a custom lookup is not possible since the value is required for the order_by to work.


--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/fed03032-9477-428c-9a69-7fcf8fe2ddd6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Add Alias or annotations without group-by support?

Cristiano Coelho
The workaround, although extremely ugly and which will probably cause issues in the future (reason I only used it for the model I needed to do those odd queries) was to use a custom queryset/manager. Something like this.

class FasterCountQuerySet(QuerySet):
   
def count(self):
       
return super(FasterCountQuerySet, self.values('pk')).count()
FasterCountManager = Manager.from_queryset(FasterCountQuerySet)

But again, this is extremely ugly and will still cause a subquery, but without the unnecessary group by and extra function calls.


El miércoles, 7 de marzo de 2018, 19:48:01 (UTC-3), Jared Proffitt escribió:
I have also run into this exact problem. Would love to get this fixed. Have you found a good workaround?

On Tuesday, December 26, 2017 at 12:37:16 PM UTC-6, Cristiano Coelho wrote:
Hello, I'm having a hard time explaining the exact issue but I hope it's clear enough.


Following this issue (<a href="https://groups.google.com/forum/#!searchin/django-users/cristiano%7Csort:date/django-users/q6XdfyK29HA/TcE8oFitBQAJ" rel="nofollow" target="_blank" onmousedown="this.href=&#39;https://groups.google.com/forum/#!searchin/django-users/cristiano%7Csort:date/django-users/q6XdfyK29HA/TcE8oFitBQAJ&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/forum/#!searchin/django-users/cristiano%7Csort:date/django-users/q6XdfyK29HA/TcE8oFitBQAJ&#39;;return true;">https://groups.google.com/forum/#!searchin/django-users/cristiano%7Csort:date/django-users/q6XdfyK29HA/TcE8oFitBQAJ) from django users and a related ticket (<a href="https://code.djangoproject.com/ticket/27719" rel="nofollow" target="_blank" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fcode.djangoproject.com%2Fticket%2F27719\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHXLlcoILhMfwViibVvTLw-xXcgsw&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fcode.djangoproject.com%2Fticket%2F27719\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHXLlcoILhMfwViibVvTLw-xXcgsw&#39;;return true;">https://code.djangoproject.com/ticket/27719) that seems to be left out or forgotten already.

There has to be a way to alias or annotate a value given an expression or SQL Function that doesn't necessarily aggregates data but rather work on a single value.

Right now as shown on the django-users post, using annotate for this purpose will cause unexpected grouping and sub querying that could result in very slow and hard to debug queries.

The core issue is that using annotate without a previous call either vaues or values_list, will work as expected, simply annotating a value and returning it as an additional column, but if an aggregate is added afterwards (such as count), the final query ends up being a redundant query where the annotated value is added to a group by clause (group by id + column), to a column as part of the select (function called twice) and then wrapped into a select * (subquery), which makes the extra column as part of the select and group by useless, unless the query had any kind of left/inner join in which case the group by might make sense (although not sure about the column showing up on the select clause)

The ugly work around is to simply add a .values('id') at the end so the annotated value doesn't show on the group by and select sections, although the nested query still happens.


For this reason, there's currently no way to achieve the above without ugly work arounds or unnecessary database performance hits.

The easiest option I believe would be to follow the ticket in order to implement an alias call that works exactly like annotate but doesn't trigger any grouping.

A more complicated option is probably trying to make annotate/aggregate smarter, so all the unnecessary grouping and sub querying doesn't happen unless needed, for example, if the queryset didn't call values/values_list or if there are no relationships/joins used.


Example/demostration:

Given the following queryset

query1 = MyModel.objects.annotate(x=MyFunction('a', 'b')).filter(x__gte=0.6).order_by('-x')


query1 SQL is good and looks like:

SELECT id, a, b, myfunction(a, b) as x
FROM mymodel
WHERE myfunction
(a, b) >= 0.6
ORDER BY x desc

Notice how there's no group by, the ORM was smart enough to not include it since there was no previous call to values/values_list


If we run query1.count() the final SQL looks like:

SELECT COUNT(*) FROM (
    SELECT id
, myfunction(a, b) as x
    FROM mymodel
    WHERE myfunction
(a ,b) >= 0.6
    GROUP BY id
, myfunction(a ,b)
) subquery

which if myfunction is slow, will add a massive slow down that's not even needed, and should actually be just:

SELECT count(*)
FROM mymodel
WHERE myfunction
(a ,b) >= 0.6


while the other query should ONLY happen if the group by makes sense (i.e, if there's a join somewhere, or a values/values_list was used previously so id is not part of the group by statement)

but if we work around the issue adding a query1.values('id').count(), the final query ends up better:

SELECT COUNT(*) FROM (
    SELECT id
    FROM mymodel
    WHERE myfunction
(a ,b) >= 0.6
) subquery


I hope I could explain this clear enough with the example, and note that using a custom lookup is not possible since the value is required for the order_by to work.


--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/44b855e4-3cc0-4920-9b0d-d5172a17bfbf%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Add Alias or annotations without group-by support?

Josh Smeaton
Would teaching filter() and friends to use expressions directly solve your issue? You suggested using `alias` upthread, but that's only really required so you can refer to it later? Unless you wanted to refer to the field more than once, having each queryset method respect expressions should be enough I think.

https://github.com/django/django/pull/8119 adds boolean expression support to filter. I believe most other queryset methods have support for expressions now (order_by, values/values_list).

For the alias used multiple times case, it should be enough to annotate and then restrict with values if you don't actually want it in the select/group list.

On Friday, 9 March 2018 00:22:00 UTC+11, Cristiano Coelho wrote:
The workaround, although extremely ugly and which will probably cause issues in the future (reason I only used it for the model I needed to do those odd queries) was to use a custom queryset/manager. Something like this.

class FasterCountQuerySet(QuerySet):
   
def count(self):
       
return super(FasterCountQuerySet, self.values('pk')).count()
FasterCountManager = Manager.from_queryset(FasterCountQuerySet)

But again, this is extremely ugly and will still cause a subquery, but without the unnecessary group by and extra function calls.


El miércoles, 7 de marzo de 2018, 19:48:01 (UTC-3), Jared Proffitt escribió:
I have also run into this exact problem. Would love to get this fixed. Have you found a good workaround?

On Tuesday, December 26, 2017 at 12:37:16 PM UTC-6, Cristiano Coelho wrote:
Hello, I'm having a hard time explaining the exact issue but I hope it's clear enough.


Following this issue (<a href="https://groups.google.com/forum/#!searchin/django-users/cristiano%7Csort:date/django-users/q6XdfyK29HA/TcE8oFitBQAJ" rel="nofollow" target="_blank" onmousedown="this.href=&#39;https://groups.google.com/forum/#!searchin/django-users/cristiano%7Csort:date/django-users/q6XdfyK29HA/TcE8oFitBQAJ&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/forum/#!searchin/django-users/cristiano%7Csort:date/django-users/q6XdfyK29HA/TcE8oFitBQAJ&#39;;return true;">https://groups.google.com/forum/#!searchin/django-users/cristiano%7Csort:date/django-users/q6XdfyK29HA/TcE8oFitBQAJ) from django users and a related ticket (<a href="https://code.djangoproject.com/ticket/27719" rel="nofollow" target="_blank" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fcode.djangoproject.com%2Fticket%2F27719\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHXLlcoILhMfwViibVvTLw-xXcgsw&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fcode.djangoproject.com%2Fticket%2F27719\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHXLlcoILhMfwViibVvTLw-xXcgsw&#39;;return true;">https://code.djangoproject.com/ticket/27719) that seems to be left out or forgotten already.

There has to be a way to alias or annotate a value given an expression or SQL Function that doesn't necessarily aggregates data but rather work on a single value.

Right now as shown on the django-users post, using annotate for this purpose will cause unexpected grouping and sub querying that could result in very slow and hard to debug queries.

The core issue is that using annotate without a previous call either vaues or values_list, will work as expected, simply annotating a value and returning it as an additional column, but if an aggregate is added afterwards (such as count), the final query ends up being a redundant query where the annotated value is added to a group by clause (group by id + column), to a column as part of the select (function called twice) and then wrapped into a select * (subquery), which makes the extra column as part of the select and group by useless, unless the query had any kind of left/inner join in which case the group by might make sense (although not sure about the column showing up on the select clause)

The ugly work around is to simply add a .values('id') at the end so the annotated value doesn't show on the group by and select sections, although the nested query still happens.


For this reason, there's currently no way to achieve the above without ugly work arounds or unnecessary database performance hits.

The easiest option I believe would be to follow the ticket in order to implement an alias call that works exactly like annotate but doesn't trigger any grouping.

A more complicated option is probably trying to make annotate/aggregate smarter, so all the unnecessary grouping and sub querying doesn't happen unless needed, for example, if the queryset didn't call values/values_list or if there are no relationships/joins used.


Example/demostration:

Given the following queryset

query1 = MyModel.objects.annotate(x=MyFunction('a', 'b')).filter(x__gte=0.6).order_by('-x')


query1 SQL is good and looks like:

SELECT id, a, b, myfunction(a, b) as x
FROM mymodel
WHERE myfunction
(a, b) >= 0.6
ORDER BY x desc

Notice how there's no group by, the ORM was smart enough to not include it since there was no previous call to values/values_list


If we run query1.count() the final SQL looks like:

SELECT COUNT(*) FROM (
    SELECT id
, myfunction(a, b) as x
    FROM mymodel
    WHERE myfunction
(a ,b) >= 0.6
    GROUP BY id
, myfunction(a ,b)
) subquery

which if myfunction is slow, will add a massive slow down that's not even needed, and should actually be just:

SELECT count(*)
FROM mymodel
WHERE myfunction
(a ,b) >= 0.6


while the other query should ONLY happen if the group by makes sense (i.e, if there's a join somewhere, or a values/values_list was used previously so id is not part of the group by statement)

but if we work around the issue adding a query1.values('id').count(), the final query ends up better:

SELECT COUNT(*) FROM (
    SELECT id
    FROM mymodel
    WHERE myfunction
(a ,b) >= 0.6
) subquery


I hope I could explain this clear enough with the example, and note that using a custom lookup is not possible since the value is required for the order_by to work.


--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/6561d717-265b-435c-ad0f-cebe51afe2d3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Add Alias or annotations without group-by support?

Cristiano Coelho
It wouldn't work if you also want to order by the annotated value.

El viernes, 9 de marzo de 2018, 8:27:36 (UTC-3), Josh Smeaton escribió:
Would teaching filter() and friends to use expressions directly solve your issue? You suggested using `alias` upthread, but that's only really required so you can refer to it later? Unless you wanted to refer to the field more than once, having each queryset method respect expressions should be enough I think.

<a href="https://github.com/django/django/pull/8119" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fgithub.com%2Fdjango%2Fdjango%2Fpull%2F8119\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHrjaRCCchFLscOuJEhGWyYA1MGTA&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fgithub.com%2Fdjango%2Fdjango%2Fpull%2F8119\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHrjaRCCchFLscOuJEhGWyYA1MGTA&#39;;return true;">https://github.com/django/django/pull/8119 adds boolean expression support to filter. I believe most other queryset methods have support for expressions now (order_by, values/values_list).

For the alias used multiple times case, it should be enough to annotate and then restrict with values if you don't actually want it in the select/group list.

On Friday, 9 March 2018 00:22:00 UTC+11, Cristiano Coelho wrote:
The workaround, although extremely ugly and which will probably cause issues in the future (reason I only used it for the model I needed to do those odd queries) was to use a custom queryset/manager. Something like this.

class FasterCountQuerySet(QuerySet):
   
def count(self):
       
return super(FasterCountQuerySet, self.values('pk')).count()
FasterCountManager = Manager.from_queryset(FasterCountQuerySet)

But again, this is extremely ugly and will still cause a subquery, but without the unnecessary group by and extra function calls.


El miércoles, 7 de marzo de 2018, 19:48:01 (UTC-3), Jared Proffitt escribió:
I have also run into this exact problem. Would love to get this fixed. Have you found a good workaround?

On Tuesday, December 26, 2017 at 12:37:16 PM UTC-6, Cristiano Coelho wrote:
Hello, I'm having a hard time explaining the exact issue but I hope it's clear enough.


Following this issue (<a href="https://groups.google.com/forum/#!searchin/django-users/cristiano%7Csort:date/django-users/q6XdfyK29HA/TcE8oFitBQAJ" rel="nofollow" target="_blank" onmousedown="this.href=&#39;https://groups.google.com/forum/#!searchin/django-users/cristiano%7Csort:date/django-users/q6XdfyK29HA/TcE8oFitBQAJ&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/forum/#!searchin/django-users/cristiano%7Csort:date/django-users/q6XdfyK29HA/TcE8oFitBQAJ&#39;;return true;">https://groups.google.com/forum/#!searchin/django-users/cristiano%7Csort:date/django-users/q6XdfyK29HA/TcE8oFitBQAJ) from django users and a related ticket (<a href="https://code.djangoproject.com/ticket/27719" rel="nofollow" target="_blank" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fcode.djangoproject.com%2Fticket%2F27719\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHXLlcoILhMfwViibVvTLw-xXcgsw&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fcode.djangoproject.com%2Fticket%2F27719\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHXLlcoILhMfwViibVvTLw-xXcgsw&#39;;return true;">https://code.djangoproject.com/ticket/27719) that seems to be left out or forgotten already.

There has to be a way to alias or annotate a value given an expression or SQL Function that doesn't necessarily aggregates data but rather work on a single value.

Right now as shown on the django-users post, using annotate for this purpose will cause unexpected grouping and sub querying that could result in very slow and hard to debug queries.

The core issue is that using annotate without a previous call either vaues or values_list, will work as expected, simply annotating a value and returning it as an additional column, but if an aggregate is added afterwards (such as count), the final query ends up being a redundant query where the annotated value is added to a group by clause (group by id + column), to a column as part of the select (function called twice) and then wrapped into a select * (subquery), which makes the extra column as part of the select and group by useless, unless the query had any kind of left/inner join in which case the group by might make sense (although not sure about the column showing up on the select clause)

The ugly work around is to simply add a .values('id') at the end so the annotated value doesn't show on the group by and select sections, although the nested query still happens.


For this reason, there's currently no way to achieve the above without ugly work arounds or unnecessary database performance hits.

The easiest option I believe would be to follow the ticket in order to implement an alias call that works exactly like annotate but doesn't trigger any grouping.

A more complicated option is probably trying to make annotate/aggregate smarter, so all the unnecessary grouping and sub querying doesn't happen unless needed, for example, if the queryset didn't call values/values_list or if there are no relationships/joins used.


Example/demostration:

Given the following queryset

query1 = MyModel.objects.annotate(x=MyFunction('a', 'b')).filter(x__gte=0.6).order_by('-x')


query1 SQL is good and looks like:

SELECT id, a, b, myfunction(a, b) as x
FROM mymodel
WHERE myfunction
(a, b) >= 0.6
ORDER BY x desc

Notice how there's no group by, the ORM was smart enough to not include it since there was no previous call to values/values_list


If we run query1.count() the final SQL looks like:

SELECT COUNT(*) FROM (
    SELECT id
, myfunction(a, b) as x
    FROM mymodel
    WHERE myfunction
(a ,b) >= 0.6
    GROUP BY id
, myfunction(a ,b)
) subquery

which if myfunction is slow, will add a massive slow down that's not even needed, and should actually be just:

SELECT count(*)
FROM mymodel
WHERE myfunction
(a ,b) >= 0.6


while the other query should ONLY happen if the group by makes sense (i.e, if there's a join somewhere, or a values/values_list was used previously so id is not part of the group by statement)

but if we work around the issue adding a query1.values('id').count(), the final query ends up better:

SELECT COUNT(*) FROM (
    SELECT id
    FROM mymodel
    WHERE myfunction
(a ,b) >= 0.6
) subquery


I hope I could explain this clear enough with the example, and note that using a custom lookup is not possible since the value is required for the order_by to work.


--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/e19e6189-418a-42b4-a666-3202f2337090%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Add Alias or annotations without group-by support?

Josh Smeaton
Sure - but you can always save the expression to a variable and use it multiple times.

mycalc = MyFunc('a', 'b')
Model.objects.filter(GreaterEqual(mycalc, 0.6)).order_by(mycalc)

I think we already have the building blocks we need to avoid adding another queryset method.

On Saturday, 10 March 2018 14:01:41 UTC+11, Cristiano Coelho wrote:
It wouldn't work if you also want to order by the annotated value.

El viernes, 9 de marzo de 2018, 8:27:36 (UTC-3), Josh Smeaton escribió:
Would teaching filter() and friends to use expressions directly solve your issue? You suggested using `alias` upthread, but that's only really required so you can refer to it later? Unless you wanted to refer to the field more than once, having each queryset method respect expressions should be enough I think.

<a href="https://github.com/django/django/pull/8119" rel="nofollow" target="_blank" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fgithub.com%2Fdjango%2Fdjango%2Fpull%2F8119\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHrjaRCCchFLscOuJEhGWyYA1MGTA&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fgithub.com%2Fdjango%2Fdjango%2Fpull%2F8119\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHrjaRCCchFLscOuJEhGWyYA1MGTA&#39;;return true;">https://github.com/django/django/pull/8119 adds boolean expression support to filter. I believe most other queryset methods have support for expressions now (order_by, values/values_list).

For the alias used multiple times case, it should be enough to annotate and then restrict with values if you don't actually want it in the select/group list.

On Friday, 9 March 2018 00:22:00 UTC+11, Cristiano Coelho wrote:
The workaround, although extremely ugly and which will probably cause issues in the future (reason I only used it for the model I needed to do those odd queries) was to use a custom queryset/manager. Something like this.

class FasterCountQuerySet(QuerySet):
   
def count(self):
       
return super(FasterCountQuerySet, self.values('pk')).count()
FasterCountManager = Manager.from_queryset(FasterCountQuerySet)

But again, this is extremely ugly and will still cause a subquery, but without the unnecessary group by and extra function calls.


El miércoles, 7 de marzo de 2018, 19:48:01 (UTC-3), Jared Proffitt escribió:
I have also run into this exact problem. Would love to get this fixed. Have you found a good workaround?

On Tuesday, December 26, 2017 at 12:37:16 PM UTC-6, Cristiano Coelho wrote:
Hello, I'm having a hard time explaining the exact issue but I hope it's clear enough.


Following this issue (<a href="https://groups.google.com/forum/#!searchin/django-users/cristiano%7Csort:date/django-users/q6XdfyK29HA/TcE8oFitBQAJ" rel="nofollow" target="_blank" onmousedown="this.href=&#39;https://groups.google.com/forum/#!searchin/django-users/cristiano%7Csort:date/django-users/q6XdfyK29HA/TcE8oFitBQAJ&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/forum/#!searchin/django-users/cristiano%7Csort:date/django-users/q6XdfyK29HA/TcE8oFitBQAJ&#39;;return true;">https://groups.google.com/forum/#!searchin/django-users/cristiano%7Csort:date/django-users/q6XdfyK29HA/TcE8oFitBQAJ) from django users and a related ticket (<a href="https://code.djangoproject.com/ticket/27719" rel="nofollow" target="_blank" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fcode.djangoproject.com%2Fticket%2F27719\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHXLlcoILhMfwViibVvTLw-xXcgsw&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fcode.djangoproject.com%2Fticket%2F27719\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHXLlcoILhMfwViibVvTLw-xXcgsw&#39;;return true;">https://code.djangoproject.com/ticket/27719) that seems to be left out or forgotten already.

There has to be a way to alias or annotate a value given an expression or SQL Function that doesn't necessarily aggregates data but rather work on a single value.

Right now as shown on the django-users post, using annotate for this purpose will cause unexpected grouping and sub querying that could result in very slow and hard to debug queries.

The core issue is that using annotate without a previous call either vaues or values_list, will work as expected, simply annotating a value and returning it as an additional column, but if an aggregate is added afterwards (such as count), the final query ends up being a redundant query where the annotated value is added to a group by clause (group by id + column), to a column as part of the select (function called twice) and then wrapped into a select * (subquery), which makes the extra column as part of the select and group by useless, unless the query had any kind of left/inner join in which case the group by might make sense (although not sure about the column showing up on the select clause)

The ugly work around is to simply add a .values('id') at the end so the annotated value doesn't show on the group by and select sections, although the nested query still happens.


For this reason, there's currently no way to achieve the above without ugly work arounds or unnecessary database performance hits.

The easiest option I believe would be to follow the ticket in order to implement an alias call that works exactly like annotate but doesn't trigger any grouping.

A more complicated option is probably trying to make annotate/aggregate smarter, so all the unnecessary grouping and sub querying doesn't happen unless needed, for example, if the queryset didn't call values/values_list or if there are no relationships/joins used.


Example/demostration:

Given the following queryset

query1 = MyModel.objects.annotate(x=MyFunction('a', 'b')).filter(x__gte=0.6).order_by('-x')


query1 SQL is good and looks like:

SELECT id, a, b, myfunction(a, b) as x
FROM mymodel
WHERE myfunction
(a, b) >= 0.6
ORDER BY x desc

Notice how there's no group by, the ORM was smart enough to not include it since there was no previous call to values/values_list


If we run query1.count() the final SQL looks like:

SELECT COUNT(*) FROM (
    SELECT id
, myfunction(a, b) as x
    FROM mymodel
    WHERE myfunction
(a ,b) >= 0.6
    GROUP BY id
, myfunction(a ,b)
) subquery

which if myfunction is slow, will add a massive slow down that's not even needed, and should actually be just:

SELECT count(*)
FROM mymodel
WHERE myfunction
(a ,b) >= 0.6


while the other query should ONLY happen if the group by makes sense (i.e, if there's a join somewhere, or a values/values_list was used previously so id is not part of the group by statement)

but if we work around the issue adding a query1.values('id').count(), the final query ends up better:

SELECT COUNT(*) FROM (
    SELECT id
    FROM mymodel
    WHERE myfunction
(a ,b) >= 0.6
) subquery


I hope I could explain this clear enough with the example, and note that using a custom lookup is not possible since the value is required for the order_by to work.


--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/c481d2af-9277-4e76-8937-e8fff138d282%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Add Alias or annotations without group-by support?

Cristiano Coelho
Would that actually end up executing the same function twice?

I didn't state it on the original question, but the biggest issue is that on my use case, the annotation step is actually rather complicated and such wrapped in a method on the model, and then it's up to the external code to filter and sort by the annotated value. Having to use the expression every single time it's needed would defeat the purpose of it.

I agree though that having more methods on the queryset is bad, I would rather improve the annotation logic to be able to handle these cases, but might also be difficult.


El sábado, 10 de marzo de 2018, 8:51:32 (UTC-3), Josh Smeaton escribió:
Sure - but you can always save the expression to a variable and use it multiple times.

mycalc = MyFunc('a', 'b')
Model.objects.filter(GreaterEqual(mycalc, 0.6)).order_by(mycalc)

I think we already have the building blocks we need to avoid adding another queryset method.

On Saturday, 10 March 2018 14:01:41 UTC+11, Cristiano Coelho wrote:
It wouldn't work if you also want to order by the annotated value.

El viernes, 9 de marzo de 2018, 8:27:36 (UTC-3), Josh Smeaton escribió:
Would teaching filter() and friends to use expressions directly solve your issue? You suggested using `alias` upthread, but that's only really required so you can refer to it later? Unless you wanted to refer to the field more than once, having each queryset method respect expressions should be enough I think.

<a href="https://github.com/django/django/pull/8119" rel="nofollow" target="_blank" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fgithub.com%2Fdjango%2Fdjango%2Fpull%2F8119\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHrjaRCCchFLscOuJEhGWyYA1MGTA&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fgithub.com%2Fdjango%2Fdjango%2Fpull%2F8119\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHrjaRCCchFLscOuJEhGWyYA1MGTA&#39;;return true;">https://github.com/django/django/pull/8119 adds boolean expression support to filter. I believe most other queryset methods have support for expressions now (order_by, values/values_list).

For the alias used multiple times case, it should be enough to annotate and then restrict with values if you don't actually want it in the select/group list.

On Friday, 9 March 2018 00:22:00 UTC+11, Cristiano Coelho wrote:
The workaround, although extremely ugly and which will probably cause issues in the future (reason I only used it for the model I needed to do those odd queries) was to use a custom queryset/manager. Something like this.

class FasterCountQuerySet(QuerySet):
   
def count(self):
       
return super(FasterCountQuerySet, self.values('pk')).count()
FasterCountManager = Manager.from_queryset(FasterCountQuerySet)

But again, this is extremely ugly and will still cause a subquery, but without the unnecessary group by and extra function calls.


El miércoles, 7 de marzo de 2018, 19:48:01 (UTC-3), Jared Proffitt escribió:
I have also run into this exact problem. Would love to get this fixed. Have you found a good workaround?

On Tuesday, December 26, 2017 at 12:37:16 PM UTC-6, Cristiano Coelho wrote:
Hello, I'm having a hard time explaining the exact issue but I hope it's clear enough.


Following this issue (<a href="https://groups.google.com/forum/#!searchin/django-users/cristiano%7Csort:date/django-users/q6XdfyK29HA/TcE8oFitBQAJ" rel="nofollow" target="_blank" onmousedown="this.href=&#39;https://groups.google.com/forum/#!searchin/django-users/cristiano%7Csort:date/django-users/q6XdfyK29HA/TcE8oFitBQAJ&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/forum/#!searchin/django-users/cristiano%7Csort:date/django-users/q6XdfyK29HA/TcE8oFitBQAJ&#39;;return true;">https://groups.google.com/forum/#!searchin/django-users/cristiano%7Csort:date/django-users/q6XdfyK29HA/TcE8oFitBQAJ) from django users and a related ticket (<a href="https://code.djangoproject.com/ticket/27719" rel="nofollow" target="_blank" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fcode.djangoproject.com%2Fticket%2F27719\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHXLlcoILhMfwViibVvTLw-xXcgsw&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fcode.djangoproject.com%2Fticket%2F27719\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHXLlcoILhMfwViibVvTLw-xXcgsw&#39;;return true;">https://code.djangoproject.com/ticket/27719) that seems to be left out or forgotten already.

There has to be a way to alias or annotate a value given an expression or SQL Function that doesn't necessarily aggregates data but rather work on a single value.

Right now as shown on the django-users post, using annotate for this purpose will cause unexpected grouping and sub querying that could result in very slow and hard to debug queries.

The core issue is that using annotate without a previous call either vaues or values_list, will work as expected, simply annotating a value and returning it as an additional column, but if an aggregate is added afterwards (such as count), the final query ends up being a redundant query where the annotated value is added to a group by clause (group by id + column), to a column as part of the select (function called twice) and then wrapped into a select * (subquery), which makes the extra column as part of the select and group by useless, unless the query had any kind of left/inner join in which case the group by might make sense (although not sure about the column showing up on the select clause)

The ugly work around is to simply add a .values('id') at the end so the annotated value doesn't show on the group by and select sections, although the nested query still happens.


For this reason, there's currently no way to achieve the above without ugly work arounds or unnecessary database performance hits.

The easiest option I believe would be to follow the ticket in order to implement an alias call that works exactly like annotate but doesn't trigger any grouping.

A more complicated option is probably trying to make annotate/aggregate smarter, so all the unnecessary grouping and sub querying doesn't happen unless needed, for example, if the queryset didn't call values/values_list or if there are no relationships/joins used.


Example/demostration:

Given the following queryset

query1 = MyModel.objects.annotate(x=MyFunction('a', 'b')).filter(x__gte=0.6).order_by('-x')


query1 SQL is good and looks like:

SELECT id, a, b, myfunction(a, b) as x
FROM mymodel
WHERE myfunction
(a, b) >= 0.6
ORDER BY x desc

Notice how there's no group by, the ORM was smart enough to not include it since there was no previous call to values/values_list


If we run query1.count() the final SQL looks like:

SELECT COUNT(*) FROM (
    SELECT id
, myfunction(a, b) as x
    FROM mymodel
    WHERE myfunction
(a ,b) >= 0.6
    GROUP BY id
, myfunction(a ,b)
) subquery

which if myfunction is slow, will add a massive slow down that's not even needed, and should actually be just:

SELECT count(*)
FROM mymodel
WHERE myfunction
(a ,b) >= 0.6


while the other query should ONLY happen if the group by makes sense (i.e, if there's a join somewhere, or a values/values_list was used previously so id is not part of the group by statement)

but if we work around the issue adding a query1.values('id').count(), the final query ends up better:

SELECT COUNT(*) FROM (
    SELECT id
    FROM mymodel
    WHERE myfunction
(a ,b) >= 0.6
) subquery


I hope I could explain this clear enough with the example, and note that using a custom lookup is not possible since the value is required for the order_by to work.


--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/4d465b22-e850-407a-99b5-724477c16c98%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.