Aggregating the results of a .union query without using .raw, is it possible?

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Aggregating the results of a .union query without using .raw, is it possible?

Jo-2
I have a table that looks like this

date                car_crashes         city
01.01               1                   Washington
01.02               4                   Washington
01.03               0                   Washington
01.04               2                   Washington
01.05               0                   Washington
01.06               3                   Washington
01.07               4                   Washington
01.08               1                   Washington
01.01               0                   Detroit
01.02               2                   Detroit
01.03               4                   Detroit
01.04               2                   Detroit
01.05               0                   Detroit
01.06               3                   Detroit
01.07               1                   Detroit


I want to know how many car crashes for each day happened in the entire nation, and I can do that with this:


Model.values("date") \
.annotate(car_crashes=Sum('car_crashes')) \
.values("date", "car_crashes")



Now, let's suppose I have an array like this:

weights = [
   
{
       
"city": "Washington",
       
"weight": 1,
   
},
   
{
       
"city": "Detroit",
       
"weight": 2,
   
}
]



This means that Detroit's car crashes should be multiplied by 2 before being aggregated with Washington's.

It can be done like this:

from django.db.models import IntegerField


when_list
= [When(city=w['city'], then=w['weight']) for w in weights]
case_params
= {'default': 1, 'output_field': IntegerField()}


Model.objects.values('date') \
   
.annotate(
        weighted_car_crashes
=Sum(
            F
('car_crashes') * Case(*when_list, **case_params)
   
))

However, this generates very slow SQL code, especially as more properties and a larger array are introduced.

Another solution which is way faster but still sub-optimal is using pandas to :

aggregated = false
for weight in weights:

 ag
= Model.values("date") \
 
.annotate(car_crashes=Sum('car_crashes')) \
 
.values("date", "car_crashes")


 
if aggregated is False:
 aggregated
= ag
 
else:
 aggregated
= aggregated.union(ag)


aggregated
= pd.DataFrame(aggregated)
if len(weights) > 1:
 aggregated
= aggregated.groupby("date", as_index=False).sum(level=[1])


This is faster, but still not as fast as what happens if, before calling pandas, I take the aggregated.query string and
wrap it with a few lines of SQL.


SELECT "date", sum("car_crashes") FROM (


// String from Python
str
(aggregated.query)


)


This works perfectly when pasted into my database SQL. I could do this in Python/Django using .raw() but the documentation says to ask here before using .raw() as mostly anything could be acomplished with the ORM.

Yet, I don't see how. Once I call .union on 2 querysets, I cannot aggregate further.

aggregated.union(ag).annotate(cc=Sum('car_crashes'))

gives

Cannot compute Sum('car_crashes'): 'car_crashes' is an aggregate




Is this possible to do with the Django ORM or should I use .raw()?

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/3e3e80a1-29e4-47bd-8688-f826063d2b02%40googlegroups.com.
Reply | Threaded
Open this post in threaded view
|

Re: Aggregating the results of a .union query without using .raw, is it possible?

Jo-2
My bad, the correct SQL query is this:

SELECT "date", sum("car_crashes") FROM (


// String from Python
str
(aggregated.query)


) as "aggregated" GROUP BY "date"



Il giorno lunedì 19 agosto 2019 23:10:47 UTC+2, Jo ha scritto:
I have a table that looks like this

date                car_crashes         city
01.01               1                   Washington
01.02               4                   Washington
01.03               0                   Washington
01.04               2                   Washington
01.05               0                   Washington
01.06               3                   Washington
01.07               4                   Washington
01.08               1                   Washington
01.01               0                   Detroit
01.02               2                   Detroit
01.03               4                   Detroit
01.04               2                   Detroit
01.05               0                   Detroit
01.06               3                   Detroit
01.07               1                   Detroit


I want to know how many car crashes for each day happened in the entire nation, and I can do that with this:


Model.values("date") \
.annotate(car_crashes=Sum('car_crashes')) \
.values("date", "car_crashes")



Now, let's suppose I have an array like this:

weights = [
   
{
       
"city": "Washington",
       
"weight": 1,
   
},
   
{
       
"city": "Detroit",
       
"weight": 2,
   
}
]



This means that Detroit's car crashes should be multiplied by 2 before being aggregated with Washington's.

It can be done like this:

from django.db.models import IntegerField


when_list
= [When(city=w['city'], then=w['weight']) for w in weights]
case_params
= {'default': 1, 'output_field': IntegerField()}


Model.objects.values('date') \
   
.annotate(
        weighted_car_crashes
=Sum(
            F
('car_crashes') * Case(*when_list, **case_params)
   
))

However, this generates very slow SQL code, especially as more properties and a larger array are introduced.

Another solution which is way faster but still sub-optimal is using pandas to :

aggregated = false
for weight in weights:

 ag
= Model.values("date") \
 
.annotate(car_crashes=Sum('car_crashes')) \
 
.values("date", "car_crashes")


 
if aggregated is False:
 aggregated
= ag
 
else:
 aggregated
= aggregated.union(ag)


aggregated
= pd.DataFrame(aggregated)
if len(weights) > 1:
 aggregated
= aggregated.groupby("date", as_index=False).sum(level=[1])


This is faster, but still not as fast as what happens if, before calling pandas, I take the aggregated.query string and
wrap it with a few lines of SQL.


SELECT "date", sum("car_crashes") FROM (


// String from Python
str
(aggregated.query)


)


This works perfectly when pasted into my database SQL. I could do this in Python/Django using .raw() but the documentation says to ask here before using .raw() as mostly anything could be acomplished with the ORM.

Yet, I don't see how. Once I call .union on 2 querysets, I cannot aggregate further.

aggregated.union(ag).annotate(cc=Sum('car_crashes'))

gives

Cannot compute Sum('car_crashes'): 'car_crashes' is an aggregate




Is this possible to do with the Django ORM or should I use .raw()?

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/7096fdff-ec55-4d1b-8110-9fd6bb70f9e1%40googlegroups.com.
Reply | Threaded
Open this post in threaded view
|

Re: Aggregating the results of a .union query without using .raw, is it possible?

Simon Charette
In reply to this post by Jo-2
I'm afraid the ORM doesn't support aggregation over unions yet else I would
have expected the following to work.

objects = Model.objects
querysets = (
    objects.filter(city=city).values(
        'date', weighted_car_crashes=F('car_crashes') * weight
    ) for city, weight in weights
)
union = itertools.reduce(QuerySet.union, querysets, querysets[0])
queryset = union.annotate(cc=Sum('weighted_car_crashes')).values('date', 'cc')

Best,
Simon

Le lundi 19 août 2019 17:10:47 UTC-4, Jo a écrit :
I have a table that looks like this

date                car_crashes         city
01.01               1                   Washington
01.02               4                   Washington
01.03               0                   Washington
01.04               2                   Washington
01.05               0                   Washington
01.06               3                   Washington
01.07               4                   Washington
01.08               1                   Washington
01.01               0                   Detroit
01.02               2                   Detroit
01.03               4                   Detroit
01.04               2                   Detroit
01.05               0                   Detroit
01.06               3                   Detroit
01.07               1                   Detroit


I want to know how many car crashes for each day happened in the entire nation, and I can do that with this:


Model.values("date") \
.annotate(car_crashes=Sum('car_crashes')) \
.values("date", "car_crashes")



Now, let's suppose I have an array like this:

weights = [
   
{
       
"city": "Washington",
       
"weight": 1,
   
},
   
{
       
"city": "Detroit",
       
"weight": 2,
   
}
]



This means that Detroit's car crashes should be multiplied by 2 before being aggregated with Washington's.

It can be done like this:

from django.db.models import IntegerField


when_list
= [When(city=w['city'], then=w['weight']) for w in weights]
case_params
= {'default': 1, 'output_field': IntegerField()}


Model.objects.values('date') \
   
.annotate(
        weighted_car_crashes
=Sum(
            F
('car_crashes') * Case(*when_list, **case_params)
   
))

However, this generates very slow SQL code, especially as more properties and a larger array are introduced.

Another solution which is way faster but still sub-optimal is using pandas to :

aggregated = false
for weight in weights:

 ag
= Model.values("date") \
 
.annotate(car_crashes=Sum('car_crashes')) \
 
.values("date", "car_crashes")


 
if aggregated is False:
 aggregated
= ag
 
else:
 aggregated
= aggregated.union(ag)


aggregated
= pd.DataFrame(aggregated)
if len(weights) > 1:
 aggregated
= aggregated.groupby("date", as_index=False).sum(level=[1])


This is faster, but still not as fast as what happens if, before calling pandas, I take the aggregated.query string and
wrap it with a few lines of SQL.


SELECT "date", sum("car_crashes") FROM (


// String from Python
str
(aggregated.query)


)


This works perfectly when pasted into my database SQL. I could do this in Python/Django using .raw() but the documentation says to ask here before using .raw() as mostly anything could be acomplished with the ORM.

Yet, I don't see how. Once I call .union on 2 querysets, I cannot aggregate further.

aggregated.union(ag).annotate(cc=Sum('car_crashes'))

gives

Cannot compute Sum('car_crashes'): 'car_crashes' is an aggregate




Is this possible to do with the Django ORM or should I use .raw()?

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/3e1aedd7-8eb8-4fd6-9116-ed835719deb6%40googlegroups.com.