When filtring on a subquery expressions, the subquery appears twice in the SELECT and the WHERE

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

When filtring on a subquery expressions, the subquery appears twice in the SELECT and the WHERE

Haki Benita
Hey,
I'm trying to filter on a subquery expressions using Exists and I ran into a weird issue. 
According to the docs, to filter on subquery you first need to annotate it, and than filter on it. This causes the subquery to appear both in the WHERE and in the SELECT which can lead to poor performance.

For example, my queryset:

Payment
.annotate(reconciled=Exists(
   
Record
   
.objects
   
.filter(payment_id=OuterRef('pk'))
   
.values_list('payment_id')
))
.filter(reconciled=True)

In the resulting SQL, the subquery is used in WHERE and appears in the SELECT as well:

SELECT
   
"payment"."id",
   
-- ... many more fields
   
EXISTS (
        SELECT U0
."id"
        FROM
"record" U0
        WHERE U0
."payment_id" = ("payment"."id")
   
) AS "reconciled"

FROM
   
"payment"
WHERE
    EXISTS (
        SELECT U0
."id"
        FROM
"leumicard_record" U0
        WHERE U0
."payment_id" = ("payment"."id")
   
) = False



This causes the "exists" query to be evaluated twice:

 Seq Scan on payment  (cost=0.00..63982927.73 rows=4772627 width=155)
   
Filter: (NOT (SubPlan 2))
   
SubPlan 1
     
->  Index Only Scan using record_payment_id_058ca67f on leumicard_record u0  (cost=0.42..8.47 rows=2 width=0)
           
Index Cond: (payment_id = payment.id)
   
SubPlan 2
     
->  Index Only Scan using record_payment_id_058ca67f on leumicard_record u0_1  (cost=0.42..8.47 rows=2 width=0)
           
Index Cond: (payment_id = payment.id)

This is most likely because the subquery is "annotated" and so the ORM adds it to the values list.

I tried to `defer` the field, and got the following error:

django.core.exceptions.FieldDoesNotExist: Payment has no field named 'reconciled'

To make sure that the query can be executed as intended without the subquery in the select list, I tried to explicitly list "values_list":


Payment
.annotate(reconciled=Exists(
   
Record
   
.objects
   
.filter(payment_id=OuterRef('pk'))
   
.values_list('payment_id')
))
.filter(reconciled=True)
.values_list('pk')

Query is now as intended:

SELECT
   
"payment"."id"
FROM
   
"payment"
WHERE
    EXISTS
(
        SELECT U0
."payment_id"
        FROM
"record" U0
        WHERE U0
."payment_id" = ("payment"."id")
   
) = False

Plan is cheaper:


 
Seq Scan on payment  (cost=0.00..42747797.33 rows=4772627 width=4)
   
Filter: (NOT (SubPlan 1))
   
SubPlan 1
     
->  Index Only Scan using record_payment_id_058ca67f on leumicard_record u0  (cost=0.42..8.47 rows=2 width=0)
           
Index Cond: (payment_id = payment.id)

My questions are:
-  Am I doing it right?
- Is this intended?
- What is the best way to exclude the subquery from the query SELECT list?

Thanks,
Haki Benita.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/b8213fe1-fbb0-4bf3-87fd-3a6f323456ff%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: When filtring on a subquery expressions, the subquery appears twice in the SELECT and the WHERE

Simon Charette
Hello Haki,

This should probably have been posted to the django-users list given this one
is for the development of Django itself but since there has been some recent and ungoing
development in this area I'll reply here.

First there has been work to reuse annotation aliases that is in the master branch but
haven't been released yet[0]. I'm pretty sure it would reuse the "reconciled" alias in your
case but if it doesn't I'd encourage you to submit an optimization ticket to make it so.

Secondly there's an active PR to allow expressions resolving to a boolean field to be
directly passed to .filter()[1]. That would allow you to pass your Exists expression
directly to filter() instead of annotating it first and thus avoid the double evaluation
issue you are experiencing.

Finally there's an accepted feature request for adding an alias() method that would
work how you'd expect `annotate(foo).defer('foo')` to do[2]. Maybe annotate().defer()
is more intuitive than adding yet another method.

Unfortunately I can't think of another of way of performing a single EXISTS() using
any of the currently released versions of Django.

Cheers,
Simon

[0] https://github.com/django/django/commit/1ca825e4dc186da2b93292b5c848a3e5445968d7
[1] https://github.com/django/django/pull/8119
[2] https://code.djangoproject.com/ticket/27719

Le mardi 25 juin 2019 09:12:46 UTC-4, Haki Benita a écrit :
Hey,
I'm trying to filter on a subquery expressions using Exists and I ran into a weird issue. 
<a href="https://docs.djangoproject.com/en/2.2/ref/models/expressions/#filtering-on-a-subquery-expression" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fdocs.djangoproject.com%2Fen%2F2.2%2Fref%2Fmodels%2Fexpressions%2F%23filtering-on-a-subquery-expression\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNF5cwnzw0pEeOGazO1zSY727KI3-A&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fdocs.djangoproject.com%2Fen%2F2.2%2Fref%2Fmodels%2Fexpressions%2F%23filtering-on-a-subquery-expression\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNF5cwnzw0pEeOGazO1zSY727KI3-A&#39;;return true;">According to the docs, to filter on subquery you first need to annotate it, and than filter on it. This causes the subquery to appear both in the WHERE and in the SELECT which can lead to poor performance.

For example, my queryset:

Payment
.annotate(reconciled=Exists(
   
Record
   
.objects
   
.filter(payment_id=OuterRef('pk'))
   
.values_list('payment_id')
))
.filter(reconciled=True)

In the resulting SQL, the subquery is used in WHERE and appears in the SELECT as well:

SELECT
   
"payment"."id",
   
-- ... many more fields
   
EXISTS (
        SELECT U0
."id"
        FROM
"record" U0
        WHERE U0
."payment_id" = ("payment"."id")
   
) AS "reconciled"

FROM
   
"payment"
WHERE
    EXISTS (
        SELECT U0
."id"
        FROM
"leumicard_record" U0
        WHERE U0
."payment_id" = ("payment"."id")
   
) = False



This causes the "exists" query to be evaluated twice:

 Seq Scan on payment  (cost=0.00..63982927.73 rows=4772627 width=155)
   
Filter: (NOT (SubPlan 2))
   
SubPlan 1
     
->  Index Only Scan using record_payment_id_058ca67f on leumicard_record u0  (cost=0.42..8.47 rows=2 width=0)
           
Index Cond: (payment_id = payment.id)
   
SubPlan 2
     
->  Index Only Scan using record_payment_id_058ca67f on leumicard_record u0_1  (cost=0.42..8.47 rows=2 width=0)
           
Index Cond: (payment_id = payment.id)

This is most likely because the subquery is "annotated" and so the ORM adds it to the values list.

I tried to `defer` the field, and got the following error:

django.core.exceptions.FieldDoesNotExist: Payment has no field named 'reconciled'

To make sure that the query can be executed as intended without the subquery in the select list, I tried to explicitly list "values_list":


Payment
.annotate(reconciled=Exists(
   
Record
   
.objects
   
.filter(payment_id=OuterRef('pk'))
   
.values_list('payment_id')
))
.filter(reconciled=True)
.values_list('pk')

Query is now as intended:

SELECT
   
"payment"."id"
FROM
   
"payment"
WHERE
    EXISTS
(
        SELECT U0
."payment_id"
        FROM
"record" U0
        WHERE U0
."payment_id" = ("payment"."id")
   
) = False

Plan is cheaper:


 
Seq Scan on payment  (cost=0.00..42747797.33 rows=4772627 width=4)
   
Filter: (NOT (SubPlan 1))
   
SubPlan 1
     
->  Index Only Scan using record_payment_id_058ca67f on leumicard_record u0  (cost=0.42..8.47 rows=2 width=0)
           
Index Cond: (payment_id = payment.id)

My questions are:
-  Am I doing it right?
- Is this intended?
- What is the best way to exclude the subquery from the query SELECT list?

Thanks,
Haki Benita.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/468fc3c1-8536-4143-8c3f-2cf67424a65f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: When filtring on a subquery expressions, the subquery appears twice in the SELECT and the WHERE

Haki Benita
Thanks for the quick reply Simon.

I'm currently unable to test the [0] commit for this specific use case because of some compat issues in my project. I'm also not sure how the solution can work in SQL.

The PR to allow using Exists directly in filter sounds like it can solve the problem for this particular use case.

Any way, I'm back to `__in` for now ;)
I wont bother the group with it...

Thanks,
Haki.

On Tuesday, June 25, 2019 at 5:01:32 PM UTC+3, charettes wrote:
Hello Haki,

This should probably have been posted to the django-users list given this one
is for the development of Django itself but since there has been some recent and ungoing
development in this area I'll reply here.

First there has been work to reuse annotation aliases that is in the master branch but
haven't been released yet[0]. I'm pretty sure it would reuse the "reconciled" alias in your
case but if it doesn't I'd encourage you to submit an optimization ticket to make it so.

Secondly there's an active PR to allow expressions resolving to a boolean field to be
directly passed to .filter()[1]. That would allow you to pass your Exists expression
directly to filter() instead of annotating it first and thus avoid the double evaluation
issue you are experiencing.

Finally there's an accepted feature request for adding an alias() method that would
work how you'd expect `annotate(foo).defer('foo')` to do[2]. Maybe annotate().defer()
is more intuitive than adding yet another method.

Unfortunately I can't think of another of way of performing a single EXISTS() using
any of the currently released versions of Django.

Cheers,
Simon

[0] <a href="https://github.com/django/django/commit/1ca825e4dc186da2b93292b5c848a3e5445968d7" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fgithub.com%2Fdjango%2Fdjango%2Fcommit%2F1ca825e4dc186da2b93292b5c848a3e5445968d7\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNFzO3CuF_YJwmUIig7BXVl-rUGTXA&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fgithub.com%2Fdjango%2Fdjango%2Fcommit%2F1ca825e4dc186da2b93292b5c848a3e5445968d7\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNFzO3CuF_YJwmUIig7BXVl-rUGTXA&#39;;return true;">https://github.com/django/django/commit/1ca825e4dc186da2b93292b5c848a3e5445968d7
[1] <a href="https://github.com/django/django/pull/8119" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fgithub.com%2Fdjango%2Fdjango%2Fpull%2F8119\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHrjaRCCchFLscOuJEhGWyYA1MGTA&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fgithub.com%2Fdjango%2Fdjango%2Fpull%2F8119\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHrjaRCCchFLscOuJEhGWyYA1MGTA&#39;;return true;">https://github.com/django/django/pull/8119
[2] <a href="https://code.djangoproject.com/ticket/27719" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fcode.djangoproject.com%2Fticket%2F27719\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHXLlcoILhMfwViibVvTLw-xXcgsw&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fcode.djangoproject.com%2Fticket%2F27719\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHXLlcoILhMfwViibVvTLw-xXcgsw&#39;;return true;">https://code.djangoproject.com/ticket/27719

Le mardi 25 juin 2019 09:12:46 UTC-4, Haki Benita a écrit :
Hey,
I'm trying to filter on a subquery expressions using Exists and I ran into a weird issue. 
<a href="https://docs.djangoproject.com/en/2.2/ref/models/expressions/#filtering-on-a-subquery-expression" rel="nofollow" target="_blank" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fdocs.djangoproject.com%2Fen%2F2.2%2Fref%2Fmodels%2Fexpressions%2F%23filtering-on-a-subquery-expression\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNF5cwnzw0pEeOGazO1zSY727KI3-A&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fdocs.djangoproject.com%2Fen%2F2.2%2Fref%2Fmodels%2Fexpressions%2F%23filtering-on-a-subquery-expression\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNF5cwnzw0pEeOGazO1zSY727KI3-A&#39;;return true;">According to the docs, to filter on subquery you first need to annotate it, and than filter on it. This causes the subquery to appear both in the WHERE and in the SELECT which can lead to poor performance.

For example, my queryset:

Payment
.annotate(reconciled=Exists(
   
Record
   
.objects
   
.filter(payment_id=OuterRef('pk'))
   
.values_list('payment_id')
))
.filter(reconciled=True)

In the resulting SQL, the subquery is used in WHERE and appears in the SELECT as well:

SELECT
   
"payment"."id",
   
-- ... many more fields
   
EXISTS (
        SELECT U0
."id"
        FROM
"record" U0
        WHERE U0
."payment_id" = ("payment"."id")
   
) AS "reconciled"

FROM
   
"payment"
WHERE
    EXISTS (
        SELECT U0
."id"
        FROM
"leumicard_record" U0
        WHERE U0
."payment_id" = ("payment"."id")
   
) = False



This causes the "exists" query to be evaluated twice:

 Seq Scan on payment  (cost=0.00..63982927.73 rows=4772627 width=155)
   
Filter: (NOT (SubPlan 2))
   
SubPlan 1
     
->  Index Only Scan using record_payment_id_058ca67f on leumicard_record u0  (cost=0.42..8.47 rows=2 width=0)
           
Index Cond: (payment_id = payment.id)
   
SubPlan 2
     
->  Index Only Scan using record_payment_id_058ca67f on leumicard_record u0_1  (cost=0.42..8.47 rows=2 width=0)
           
Index Cond: (payment_id = payment.id)

This is most likely because the subquery is "annotated" and so the ORM adds it to the values list.

I tried to `defer` the field, and got the following error:

django.core.exceptions.FieldDoesNotExist: Payment has no field named 'reconciled'

To make sure that the query can be executed as intended without the subquery in the select list, I tried to explicitly list "values_list":


Payment
.annotate(reconciled=Exists(
   
Record
   
.objects
   
.filter(payment_id=OuterRef('pk'))
   
.values_list('payment_id')
))
.filter(reconciled=True)
.values_list('pk')

Query is now as intended:

SELECT
   
"payment"."id"
FROM
   
"payment"
WHERE
    EXISTS
(
        SELECT U0
."payment_id"
        FROM
"record" U0
        WHERE U0
."payment_id" = ("payment"."id")
   
) = False

Plan is cheaper:


 
Seq Scan on payment  (cost=0.00..42747797.33 rows=4772627 width=4)
   
Filter: (NOT (SubPlan 1))
   
SubPlan 1
     
->  Index Only Scan using record_payment_id_058ca67f on leumicard_record u0  (cost=0.42..8.47 rows=2 width=0)
           
Index Cond: (payment_id = payment.id)

My questions are:
-  Am I doing it right?
- Is this intended?
- What is the best way to exclude the subquery from the query SELECT list?

Thanks,
Haki Benita.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/0d355841-b648-46c6-85d9-64721c0f13b6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.