Quantcast

Some thoughts about improving migration squashing

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
18 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Some thoughts about improving migration squashing

raphael
Hello everyone,

   I spent a couple hours last night trying to improve the migration squasher optimizer (migrations were taking almost 15 minutes in CI). I came up with a couple ideas for anyone interested in improvements:
 
 1- Having an interactive mode for squashing would be interested. Currently, when squashing migrations, I do the following: 
  • Generate an initial squash
  • Edit it (namely, move around operations to get more optimizations to work)
  • remove the "replaces" tag, then rerun migration squashing to "re-optimize"
  • repeat until I get something I like, then add the original "replaces" tag
   It would be cool if instead, the process were (with a flag):
  • Generate an initial squash, but have the process wait for confirmation to "commit" this squash as final (though writing out the file)
  • Edit the file, and tell the process to try re-optimizing with the same file (getting around the "no-squash of squashes" rule)
  • Potentially, allow us to also step back
 For example, the "squashmigrations" command output could look like:

generated 0001_squashed_mig.py
optmize migration[yN]? 
<user inputs y>
regenerated 0001_squashed_mig.py
( 20 operations -> 10 operations)
optimize migration[Ynr]? 
<user inputs y>
regenerated 0001_squashed_mig.py
( No change in operation count)
optimize migration[Ynr]? 
<user inputs r>
rolled back to previous version
optimize migration[Ynr]? 
<user inputs n>
Saved migration


A simpler version of this command would simply be to add an "optimizemigration" command that just reads in a single migration and optimizes the operations, without touching any of the squashiness. 


 2- The reducer might be a bit too pessimistic

 Currently, the optimizer lets "reduce" operations (that take 2 operations and return 0,1, or 2 operations, or None if nothing can be change) do whatever they want. Because of that, if you have [A, B, C ,D] and B depends on A, you can't reduce A and C because the reduction might remove A.

 In reality there are two kinds of reduction operations that we could be taking into account:

   - reducing "left". if you have [A, B, C, D] (for example, A is a CreateModel , C is an AddField for the same model),  and you can reduce A and C into just A' (A with C), giving [A', B, D], then it doesn't matter that B depends on A. 

The thing that matters is if C depends on B (for example, C adds a foreign key to a model created in B). This is actually already encoded in the CreateMode + AddField reduction, but is perhaps a more general case.

In a sense, reducing A and C "to the left" means that we're bringing A and C closer together only by moving C. This is a major part of the potential reductions that the current optimizer is missing.

   - reducing right. If you have [A, B, C, D] (for example, A is a CreateModel, C is a RemoveModel for the same model), and you can reduce A and C into just C' (C with A), giving [B, C', D], then it does matter that B depends on A. C can't depend on B (assuming causality holds in our universe)

This is the current mechanism, essentially. If B depends on A, then you can't move A past B. 

 Removing both operations is a special case of reducing right (You can make C' into a no-op).

I had monkeypatched a special case of  reducing left (taking CreateModel, AddField of different models and swapping them . For example [CreateModel(A), CreateModel(B), AddField(A.foo)] -> [CreateModel(A), AddField(A.foo), CreateModel(B)]) and got decent results, but I think making the optimization code express these two concepts separately would catch even more of the optimizations I saw that the optimizer didn't.


I hope some of this is useful 
 
  Raphael

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/6042857b-fd48-467a-badc-1cf5305ecac4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Some thoughts about improving migration squashing

raphael
I ended up having some time today, so wrote up a management command for the first suggestion!

I called it "optizemigration"


>>> ./manage.py optimizemigration appname 0001_squashed
 
# snipped django startup noise
 
Optimized from 9 operations to 4 operations

Optimized migration /Users/rtpg/proj/projname/projname/appname/migrations/0001_squashed_20170215.py


This reads in the migration file, runs the migration optimizer, and then outputs to the same file. Writing it has paid off almost immediately for me.

Those who are interested can take a look here.

How much testing/coverage requirements are there for management commands like these?


--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/b4924688-1001-4a2b-a8de-20e6a10ff83f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Some thoughts about improving migration squashing

Tim Graham-2
Hi Raphael,

It looks like a similar idea was proposed in https://groups.google.com/d/topic/django-developers/C1L-NhyQYG4/discussion. I don't think a ticket was ever created, so you can do that.

100% test coverage is required. Why would we accept untested code? ;-)

On Wednesday, February 15, 2017 at 8:00:54 AM UTC-5, [hidden email] wrote:
I ended up having some time today, so wrote up a management command for the first suggestion!

I called it "optizemigration"


>>> ./manage.py optimizemigration appname 0001_squashed
 
# snipped django startup noise
 
Optimized from 9 operations to 4 operations

Optimized migration /Users/rtpg/proj/projname/projname/appname/migrations/0001_squashed_20170215.py


This reads in the migration file, runs the migration optimizer, and then outputs to the same file. Writing it has paid off almost immediately for me.

Those who are interested can take a look <a href="https://github.com/rtpg/django/blob/optimizemigration/django/core/management/commands/optimizemigration.py" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fgithub.com%2Frtpg%2Fdjango%2Fblob%2Foptimizemigration%2Fdjango%2Fcore%2Fmanagement%2Fcommands%2Foptimizemigration.py\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHJbQ7qqOj_hcc5lpP79vA9XmNU1Q&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fgithub.com%2Frtpg%2Fdjango%2Fblob%2Foptimizemigration%2Fdjango%2Fcore%2Fmanagement%2Fcommands%2Foptimizemigration.py\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHJbQ7qqOj_hcc5lpP79vA9XmNU1Q&#39;;return true;">here.

How much testing/coverage requirements are there for management commands like these?


--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/c9d9a4d2-49ab-448b-8e5c-13ea3795c2dd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Some thoughts about improving migration squashing

Florian Apolloner
In reply to this post by raphael
Fwiw I think by default it could/should try to optimize all migrations of an app, manually specifying the migration name should be optional.

On Wednesday, February 15, 2017 at 2:00:54 PM UTC+1, [hidden email] wrote:
I ended up having some time today, so wrote up a management command for the first suggestion!

I called it "optizemigration"


>>> ./manage.py optimizemigration appname 0001_squashed
 
# snipped django startup noise
 
Optimized from 9 operations to 4 operations

Optimized migration /Users/rtpg/proj/projname/projname/appname/migrations/0001_squashed_20170215.py


This reads in the migration file, runs the migration optimizer, and then outputs to the same file. Writing it has paid off almost immediately for me.

Those who are interested can take a look <a href="https://github.com/rtpg/django/blob/optimizemigration/django/core/management/commands/optimizemigration.py" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fgithub.com%2Frtpg%2Fdjango%2Fblob%2Foptimizemigration%2Fdjango%2Fcore%2Fmanagement%2Fcommands%2Foptimizemigration.py\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHJbQ7qqOj_hcc5lpP79vA9XmNU1Q&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fgithub.com%2Frtpg%2Fdjango%2Fblob%2Foptimizemigration%2Fdjango%2Fcore%2Fmanagement%2Fcommands%2Foptimizemigration.py\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHJbQ7qqOj_hcc5lpP79vA9XmNU1Q&#39;;return true;">here.

How much testing/coverage requirements are there for management commands like these?


--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/6055f360-b916-4bd3-8b9a-eea3768c1241%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Some thoughts about improving migration squashing

Markus Holtermann
In reply to this post by raphael
Thanks Raphael, that's a pretty good write up!

You're essentially speaking about 2 things here, in my opinion:

1. Adding a new feature for interactive squash
2. Improving the MigrationOptimizer

I certainly see a point for 2. Not sure how much for 1. Anyway, your reasoning for 2 sounds great! I'd be more than happy if you want to get this into Django! Can you create a respective ticket and drop your explanation in there, please :)

Cheers,

/Markus

On Wednesday, February 15, 2017 at 1:22:11 PM UTC+1, [hidden email] wrote:
Hello everyone,

   I spent a couple hours last night trying to improve the migration squasher optimizer (migrations were taking almost 15 minutes in CI). I came up with a couple ideas for anyone interested in improvements:
 
 1- Having an interactive mode for squashing would be interested. Currently, when squashing migrations, I do the following: 
  • Generate an initial squash
  • Edit it (namely, move around operations to get more optimizations to work)
  • remove the "replaces" tag, then rerun migration squashing to "re-optimize"
  • repeat until I get something I like, then add the original "replaces" tag
   It would be cool if instead, the process were (with a flag):
  • Generate an initial squash, but have the process wait for confirmation to "commit" this squash as final (though writing out the file)
  • Edit the file, and tell the process to try re-optimizing with the same file (getting around the "no-squash of squashes" rule)
  • Potentially, allow us to also step back
 For example, the "squashmigrations" command output could look like:

generated 0001_squashed_mig.py
optmize migration[yN]? 
<user inputs y>
regenerated 0001_squashed_mig.py
( 20 operations -> 10 operations)
optimize migration[Ynr]? 
<user inputs y>
regenerated 0001_squashed_mig.py
( No change in operation count)
optimize migration[Ynr]? 
<user inputs r>
rolled back to previous version
optimize migration[Ynr]? 
<user inputs n>
Saved migration


A simpler version of this command would simply be to add an "optimizemigration" command that just reads in a single migration and optimizes the operations, without touching any of the squashiness. 


 2- The reducer might be a bit too pessimistic

 Currently, the optimizer lets "reduce" operations (that take 2 operations and return 0,1, or 2 operations, or None if nothing can be change) do whatever they want. Because of that, if you have [A, B, C ,D] and B depends on A, you can't reduce A and C because the reduction might remove A.

 In reality there are two kinds of reduction operations that we could be taking into account:

   - reducing "left". if you have [A, B, C, D] (for example, A is a CreateModel , C is an AddField for the same model),  and you can reduce A and C into just A' (A with C), giving [A', B, D], then it doesn't matter that B depends on A. 

The thing that matters is if C depends on B (for example, C adds a foreign key to a model created in B). This is actually already encoded in the CreateMode + AddField reduction, but is perhaps a more general case.

In a sense, reducing A and C "to the left" means that we're bringing A and C closer together only by moving C. This is a major part of the potential reductions that the current optimizer is missing.

   - reducing right. If you have [A, B, C, D] (for example, A is a CreateModel, C is a RemoveModel for the same model), and you can reduce A and C into just C' (C with A), giving [B, C', D], then it does matter that B depends on A. C can't depend on B (assuming causality holds in our universe)

This is the current mechanism, essentially. If B depends on A, then you can't move A past B. 

 Removing both operations is a special case of reducing right (You can make C' into a no-op).

I had monkeypatched a special case of  reducing left (taking CreateModel, AddField of different models and swapping them . For example [CreateModel(A), CreateModel(B), AddField(A.foo)] -> [CreateModel(A), AddField(A.foo), CreateModel(B)]) and got decent results, but I think making the optimization code express these two concepts separately would catch even more of the optimizations I saw that the optimizer didn't.


I hope some of this is useful 
 
  Raphael

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/bcc73cef-e6f8-4aaf-b5a9-4592895b3b2e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Some thoughts about improving migration squashing

Markus Holtermann
In reply to this post by Florian Apolloner
What might be interesting to look into when squashing all migrations in one app would be to assume no migrations would exist. That could then result in only 2 migrations which could run through the optimizer (as opposed to let's say 20 migrations with many more operations).

/Markus

On Wednesday, February 15, 2017 at 3:46:01 PM UTC+1, Florian Apolloner wrote:
Fwiw I think by default it could/should try to optimize all migrations of an app, manually specifying the migration name should be optional.

On Wednesday, February 15, 2017 at 2:00:54 PM UTC+1, [hidden email] wrote:
I ended up having some time today, so wrote up a management command for the first suggestion!

I called it "optizemigration"


>>> ./manage.py optimizemigration appname 0001_squashed
 
# snipped django startup noise
 
Optimized from 9 operations to 4 operations

Optimized migration /Users/rtpg/proj/projname/projname/appname/migrations/0001_squashed_20170215.py


This reads in the migration file, runs the migration optimizer, and then outputs to the same file. Writing it has paid off almost immediately for me.

Those who are interested can take a look <a href="https://github.com/rtpg/django/blob/optimizemigration/django/core/management/commands/optimizemigration.py" rel="nofollow" target="_blank" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fgithub.com%2Frtpg%2Fdjango%2Fblob%2Foptimizemigration%2Fdjango%2Fcore%2Fmanagement%2Fcommands%2Foptimizemigration.py\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHJbQ7qqOj_hcc5lpP79vA9XmNU1Q&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fgithub.com%2Frtpg%2Fdjango%2Fblob%2Foptimizemigration%2Fdjango%2Fcore%2Fmanagement%2Fcommands%2Foptimizemigration.py\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHJbQ7qqOj_hcc5lpP79vA9XmNU1Q&#39;;return true;">here.

How much testing/coverage requirements are there for management commands like these?


--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/24896164-cf2f-4eb9-8d87-3932f2c30e78%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Some thoughts about improving migration squashing

raphael
Markus: Now that I've written a command to optimize a single migration file, I think that it's sufficient for the "squash, edit, optimize" workflow that I was doing before. It's more about offering people to get their squashing done well until our optimizer becomes omniscient. 

Florian: Having the command run on all migrations should be straightforward, I'll look into that!

Tim: It's not untested! I tested it locally by running it ;) I'll write up a couple test cases and some documentation.


I've posted one ticket for the management command to pass a selected migration through the optimizer, and another ticket for improvements to the optimizer itself.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/686a3cd9-c15c-4619-a2f7-5aa9f9978910%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Some thoughts about improving migration squashing

Andrew Godwin-3
In reply to this post by Markus Holtermann

You're essentially speaking about 2 things here, in my opinion:

1. Adding a new feature for interactive squash
2. Improving the MigrationOptimizer

I certainly see a point for 2. Not sure how much for 1. Anyway, your reasoning for 2 sounds great! I'd be more than happy if you want to get this into Django! Can you create a respective ticket and drop your explanation in there, please :)


I am also definitely in support of 2), and I can see 1) being useful for migration sets that have a lot of operations the optimiser won't touch by default (e.g. SQL), but it's probably more work to be less generally useful.

Andrew
 

On Wednesday, February 15, 2017 at 1:22:11 PM UTC+1, [hidden email] wrote:
Hello everyone,

   I spent a couple hours last night trying to improve the migration squasher optimizer (migrations were taking almost 15 minutes in CI). I came up with a couple ideas for anyone interested in improvements:
 
 1- Having an interactive mode for squashing would be interested. Currently, when squashing migrations, I do the following: 
  • Generate an initial squash
  • Edit it (namely, move around operations to get more optimizations to work)
  • remove the "replaces" tag, then rerun migration squashing to "re-optimize"
  • repeat until I get something I like, then add the original "replaces" tag
   It would be cool if instead, the process were (with a flag):
  • Generate an initial squash, but have the process wait for confirmation to "commit" this squash as final (though writing out the file)
  • Edit the file, and tell the process to try re-optimizing with the same file (getting around the "no-squash of squashes" rule)
  • Potentially, allow us to also step back
 For example, the "squashmigrations" command output could look like:

generated 0001_squashed_mig.py
optmize migration[yN]? 
<user inputs y>
regenerated 0001_squashed_mig.py
( 20 operations -> 10 operations)
optimize migration[Ynr]? 
<user inputs y>
regenerated 0001_squashed_mig.py
( No change in operation count)
optimize migration[Ynr]? 
<user inputs r>
rolled back to previous version
optimize migration[Ynr]? 
<user inputs n>
Saved migration


A simpler version of this command would simply be to add an "optimizemigration" command that just reads in a single migration and optimizes the operations, without touching any of the squashiness. 


 2- The reducer might be a bit too pessimistic

 Currently, the optimizer lets "reduce" operations (that take 2 operations and return 0,1, or 2 operations, or None if nothing can be change) do whatever they want. Because of that, if you have [A, B, C ,D] and B depends on A, you can't reduce A and C because the reduction might remove A.

 In reality there are two kinds of reduction operations that we could be taking into account:

   - reducing "left". if you have [A, B, C, D] (for example, A is a CreateModel , C is an AddField for the same model),  and you can reduce A and C into just A' (A with C), giving [A', B, D], then it doesn't matter that B depends on A. 

The thing that matters is if C depends on B (for example, C adds a foreign key to a model created in B). This is actually already encoded in the CreateMode + AddField reduction, but is perhaps a more general case.

In a sense, reducing A and C "to the left" means that we're bringing A and C closer together only by moving C. This is a major part of the potential reductions that the current optimizer is missing.

   - reducing right. If you have [A, B, C, D] (for example, A is a CreateModel, C is a RemoveModel for the same model), and you can reduce A and C into just C' (C with A), giving [B, C', D], then it does matter that B depends on A. C can't depend on B (assuming causality holds in our universe)

This is the current mechanism, essentially. If B depends on A, then you can't move A past B. 

 Removing both operations is a special case of reducing right (You can make C' into a no-op).

I had monkeypatched a special case of  reducing left (taking CreateModel, AddField of different models and swapping them . For example [CreateModel(A), CreateModel(B), AddField(A.foo)] -> [CreateModel(A), AddField(A.foo), CreateModel(B)]) and got decent results, but I think making the optimization code express these two concepts separately would catch even more of the optimizations I saw that the optimizer didn't.


I hope some of this is useful 
 
  Raphael

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/bcc73cef-e6f8-4aaf-b5a9-4592895b3b2e%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CAFwN1uqjUr-b50fd30_hwfrf1NAedo2XqmrGwxz0ci-SaK%2B5-w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Some thoughts about improving migration squashing

Simon Charette
In reply to this post by raphael
Hi Raphael,

I've been working on making the optimizer a bit smarter recently and came to the
same conclusion as you concerning the "left" and "right" optimizations.

This should be possible to solve by allowing `Operation.reduce()` to return the
full list of operations it replaces by appending `in_betwen` before or after the
operations it combines to perform "left" or "right" optimizations.

In the mean time I have a solution that makes the optimizer perform a bit better
that I'd be glad to get your feedback on[1]. One thing bugging me was that
`RemoveField` operations didn't have any references to definition of the field
they were removing, making optimization "through" them unsafe as they could be
operating on a related field.

Cheers,
Simon

[1] https://github.com/django/django/pull/7999/files

Le mercredi 15 février 2017 07:22:11 UTC-5, [hidden email] a écrit :
Hello everyone,

   I spent a couple hours last night trying to improve the migration squasher optimizer (migrations were taking almost 15 minutes in CI). I came up with a couple ideas for anyone interested in improvements:
 
 1- Having an interactive mode for squashing would be interested. Currently, when squashing migrations, I do the following: 
  • Generate an initial squash
  • Edit it (namely, move around operations to get more optimizations to work)
  • remove the "replaces" tag, then rerun migration squashing to "re-optimize"
  • repeat until I get something I like, then add the original "replaces" tag
   It would be cool if instead, the process were (with a flag):
  • Generate an initial squash, but have the process wait for confirmation to "commit" this squash as final (though writing out the file)
  • Edit the file, and tell the process to try re-optimizing with the same file (getting around the "no-squash of squashes" rule)
  • Potentially, allow us to also step back
 For example, the "squashmigrations" command output could look like:

generated 0001_squashed_mig.py
optmize migration[yN]? 
<user inputs y>
regenerated 0001_squashed_mig.py
( 20 operations -> 10 operations)
optimize migration[Ynr]? 
<user inputs y>
regenerated 0001_squashed_mig.py
( No change in operation count)
optimize migration[Ynr]? 
<user inputs r>
rolled back to previous version
optimize migration[Ynr]? 
<user inputs n>
Saved migration


A simpler version of this command would simply be to add an "optimizemigration" command that just reads in a single migration and optimizes the operations, without touching any of the squashiness. 


 2- The reducer might be a bit too pessimistic

 Currently, the optimizer lets "reduce" operations (that take 2 operations and return 0,1, or 2 operations, or None if nothing can be change) do whatever they want. Because of that, if you have [A, B, C ,D] and B depends on A, you can't reduce A and C because the reduction might remove A.

 In reality there are two kinds of reduction operations that we could be taking into account:

   - reducing "left". if you have [A, B, C, D] (for example, A is a CreateModel , C is an AddField for the same model),  and you can reduce A and C into just A' (A with C), giving [A', B, D], then it doesn't matter that B depends on A. 

The thing that matters is if C depends on B (for example, C adds a foreign key to a model created in B). This is actually already encoded in the CreateMode + AddField reduction, but is perhaps a more general case.

In a sense, reducing A and C "to the left" means that we're bringing A and C closer together only by moving C. This is a major part of the potential reductions that the current optimizer is missing.

   - reducing right. If you have [A, B, C, D] (for example, A is a CreateModel, C is a RemoveModel for the same model), and you can reduce A and C into just C' (C with A), giving [B, C', D], then it does matter that B depends on A. C can't depend on B (assuming causality holds in our universe)

This is the current mechanism, essentially. If B depends on A, then you can't move A past B. 

 Removing both operations is a special case of reducing right (You can make C' into a no-op).

I had monkeypatched a special case of  reducing left (taking CreateModel, AddField of different models and swapping them . For example [CreateModel(A), CreateModel(B), AddField(A.foo)] -> [CreateModel(A), AddField(A.foo), CreateModel(B)]) and got decent results, but I think making the optimization code express these two concepts separately would catch even more of the optimizations I saw that the optimizer didn't.


I hope some of this is useful 
 
  Raphael

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/5f1833a0-e498-4ba2-a057-edf2252e6706%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Some thoughts about improving migration squashing

raphael
Hey Simon, 

 I looked through your PR and added a couple comments. The main thing is I think we can actually ignore the field context on "RemoveField", if only because the executor doesn't need it. Even though the field might be pointing to a related model, that doesn't prevent being optimized through. 

 This is hard to explain, but intuitively, each "RemoveField" is paired with an "AddField" or "CreateModel" that does depend on the related model. So if we have a potentially dangerous optimization, those initial operations will "protect" the causal order, not "RemoveField".

 Raphael

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/9dfdcec6-b98c-44f2-86af-99aaa8857cc9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Some thoughts about improving migration squashing

Markus Holtermann
I'm not sure if it's related or not wo what you're investigating,
RemoveField cannot "just" optimized through, as you might have another
AddField operation afterwards adding another field with the same name.

/Markus

On Thu, Feb 16, 2017 at 08:19:01AM -0800, [hidden email] wrote:

>Hey Simon,
>
> I looked through your PR and added a couple comments. The main thing is I
>think we can actually ignore the field context on "RemoveField", if only
>because the executor doesn't need it. Even though the field might be
>pointing to a related model, that doesn't prevent being optimized through.
>
> This is hard to explain, but intuitively, each "RemoveField" is paired
>with an "AddField" or "CreateModel" that *does *depend on the related
>model. So if we have a potentially dangerous optimization, those initial
>operations will "protect" the causal order, not "RemoveField".
>
> Raphael
>
>--
>You received this message because you are subscribed to the Google Groups "Django developers  (Contributions to Django itself)" group.
>To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
>To post to this group, send email to [hidden email].
>Visit this group at https://groups.google.com/group/django-developers.
>To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/9dfdcec6-b98c-44f2-86af-99aaa8857cc9%40googlegroups.com.
>For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/20170216171532.GE8346%40inel.local.
For more options, visit https://groups.google.com/d/optout.

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Some thoughts about improving migration squashing

raphael
When you have AddField('A', 'foo', ForeignKey('B')), this operation references A and foo, but also references B. 

RemoveField('A', 'foo') also references A and foo, but does it reference B? if it does, then it' s hard to have optimizations that pass through this, because this field could be referencing any model (theoretically).

But if we assert that RemoveField doesn't refer to any models referenced to by its field, then our optimizer can take a couple more liberties.

Raphael

On Friday, February 17, 2017 at 2:15:47 AM UTC+9, Markus Holtermann wrote:
I'm not sure if it's related or not wo what you're investigating,
RemoveField cannot "just" optimized through, as you might have another
AddField operation afterwards adding another field with the same name.

/Markus

On Thu, Feb 16, 2017 at 08:19:01AM -0800, <a href="javascript:" target="_blank" gdf-obfuscated-mailto="PENU1KtUBwAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">rap...@... wrote:

>Hey Simon,
>
> I looked through your PR and added a couple comments. The main thing is I
>think we can actually ignore the field context on "RemoveField", if only
>because the executor doesn't need it. Even though the field might be
>pointing to a related model, that doesn't prevent being optimized through.
>
> This is hard to explain, but intuitively, each "RemoveField" is paired
>with an "AddField" or "CreateModel" that *does *depend on the related
>model. So if we have a potentially dangerous optimization, those initial
>operations will "protect" the causal order, not "RemoveField".
>
> Raphael
>
>--
>You received this message because you are subscribed to the Google Groups "Django developers  (Contributions to Django itself)" group.
>To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="PENU1KtUBwAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">django-develop...@googlegroups.com.
>To post to this group, send email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="PENU1KtUBwAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">django-d...@googlegroups.com.
>Visit this group at <a href="https://groups.google.com/group/django-developers" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://groups.google.com/group/django-developers&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/group/django-developers&#39;;return true;">https://groups.google.com/group/django-developers.
>To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/django-developers/9dfdcec6-b98c-44f2-86af-99aaa8857cc9%40googlegroups.com" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://groups.google.com/d/msgid/django-developers/9dfdcec6-b98c-44f2-86af-99aaa8857cc9%40googlegroups.com&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/msgid/django-developers/9dfdcec6-b98c-44f2-86af-99aaa8857cc9%40googlegroups.com&#39;;return true;">https://groups.google.com/d/msgid/django-developers/9dfdcec6-b98c-44f2-86af-99aaa8857cc9%40googlegroups.com.
>For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;">https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/7515c909-a015-451d-bdaa-f040e6322166%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Some thoughts about improving migration squashing

Simon Charette
> RemoveField('A', 'foo') also references A and foo, but does it reference B? if it does, then it' s hard to have optimizations that pass through this, because this field could be referencing any model (theoretically).

I think we all agree on that.

> But if we assert that RemoveField doesn't refer to any models referenced to by its field, then our optimizer can take a couple more liberties.

Do you have suggestion on how we can assert that it's the case? The only way I could come up with was to make sure RemoveField has a reference to the field it's removing. e.g. It would be generated in the form `RemoveField('A', 'foo', ForeignKey('B'))`.

Simon

Le jeudi 16 février 2017 18:25:16 UTC-5, [hidden email] a écrit :
When you have AddField('A', 'foo', ForeignKey('B')), this operation references A and foo, but also references B. 

RemoveField('A', 'foo') also references A and foo, but does it reference B? if it does, then it' s hard to have optimizations that pass through this, because this field could be referencing any model (theoretically).

But if we assert that RemoveField doesn't refer to any models referenced to by its field, then our optimizer can take a couple more liberties.

Raphael

On Friday, February 17, 2017 at 2:15:47 AM UTC+9, Markus Holtermann wrote:
I'm not sure if it's related or not wo what you're investigating,
RemoveField cannot "just" optimized through, as you might have another
AddField operation afterwards adding another field with the same name.

/Markus

On Thu, Feb 16, 2017 at 08:19:01AM -0800, [hidden email] wrote:

>Hey Simon,
>
> I looked through your PR and added a couple comments. The main thing is I
>think we can actually ignore the field context on "RemoveField", if only
>because the executor doesn't need it. Even though the field might be
>pointing to a related model, that doesn't prevent being optimized through.
>
> This is hard to explain, but intuitively, each "RemoveField" is paired
>with an "AddField" or "CreateModel" that *does *depend on the related
>model. So if we have a potentially dangerous optimization, those initial
>operations will "protect" the causal order, not "RemoveField".
>
> Raphael
>
>--
>You received this message because you are subscribed to the Google Groups "Django developers  (Contributions to Django itself)" group.
>To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
>To post to this group, send email to [hidden email].
>Visit this group at <a href="https://groups.google.com/group/django-developers" rel="nofollow" target="_blank" onmousedown="this.href=&#39;https://groups.google.com/group/django-developers&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/group/django-developers&#39;;return true;">https://groups.google.com/group/django-developers.
>To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/django-developers/9dfdcec6-b98c-44f2-86af-99aaa8857cc9%40googlegroups.com" rel="nofollow" target="_blank" onmousedown="this.href=&#39;https://groups.google.com/d/msgid/django-developers/9dfdcec6-b98c-44f2-86af-99aaa8857cc9%40googlegroups.com&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/msgid/django-developers/9dfdcec6-b98c-44f2-86af-99aaa8857cc9%40googlegroups.com&#39;;return true;">https://groups.google.com/d/msgid/django-developers/9dfdcec6-b98c-44f2-86af-99aaa8857cc9%40googlegroups.com.
>For more options, visit <a href="https://groups.google.com/d/optout" rel="nofollow" target="_blank" onmousedown="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;">https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/1805b163-4d47-4ace-b9d3-79b6346126cd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Some thoughts about improving migration squashing

raphael
Hi Simon,

   I think it's a bit more general than that. Why does the `RemoveField` exist? Because somewhere, an `AddField`-esque operation exists before it, right?

  Let's say we have m, m' as models.

  Let -F be a RemoveField(m, 'foo', ForeignKey(m') operation. We also have two operations +M', a CreateModel(m'), and -M', a RemoveModel(m') operation. Because we have a removeField operation, we also have +M, a "CreateModel(m), somewhere

 Let's assume that we have an operations list like:   [ ...(1)..., +M', ...(2)..., -F, ...(3)..., -M']. How do we know that we can reduce +M' and -M' together "through" -F?

Because we have -F, we have +M somewhere. because m has a field for m', either: 
   1 - The field is within the initial +M operation. Because the definition depends on M', it must be in group (2).
   2 - The field is not within the initial +M operation. Because the definition of the field depends on M', we need an AddField operation (or moral equivalent) +F in group (2)

So, if we have -F in between +M' and -M', we will also have either a +M which depends on M' or a +F which depends on M' within (2).

So if the -F is between the two, there will also be another operation that will be present between the two that expresses the same dependencies. So the -F operation itself can avoid expressing its "RemoveField" indirect dependency to m' because another operation will do it for them.

----

It's hard to generalize this across everything because there are, after all, arbitrary migrations. But considering that RemoveField will only be reduced with a "moral equivalent" to AddField, I think we can expand this reasoning across everything to say that RemoveField will be sufficiently protected by the AddField operation's location (which will have to be after +M').


Raphael


On Fri, Feb 17, 2017 at 12:28 PM, charettes <[hidden email]> wrote:
> RemoveField('A', 'foo') also references A and foo, but does it reference B? if it does, then it' s hard to have optimizations that pass through this, because this field could be referencing any model (theoretically).

I think we all agree on that.

> But if we assert that RemoveField doesn't refer to any models referenced to by its field, then our optimizer can take a couple more liberties.

Do you have suggestion on how we can assert that it's the case? The only way I could come up with was to make sure RemoveField has a reference to the field it's removing. e.g. It would be generated in the form `RemoveField('A', 'foo', ForeignKey('B'))`.

Simon


Le jeudi 16 février 2017 18:25:16 UTC-5, [hidden email] a écrit :
When you have AddField('A', 'foo', ForeignKey('B')), this operation references A and foo, but also references B. 

RemoveField('A', 'foo') also references A and foo, but does it reference B? if it does, then it' s hard to have optimizations that pass through this, because this field could be referencing any model (theoretically).

But if we assert that RemoveField doesn't refer to any models referenced to by its field, then our optimizer can take a couple more liberties.

Raphael

On Friday, February 17, 2017 at 2:15:47 AM UTC+9, Markus Holtermann wrote:
I'm not sure if it's related or not wo what you're investigating,
RemoveField cannot "just" optimized through, as you might have another
AddField operation afterwards adding another field with the same name.

/Markus

On Thu, Feb 16, 2017 at 08:19:01AM -0800, [hidden email] wrote:

>Hey Simon,
>
> I looked through your PR and added a couple comments. The main thing is I
>think we can actually ignore the field context on "RemoveField", if only
>because the executor doesn't need it. Even though the field might be
>pointing to a related model, that doesn't prevent being optimized through.
>
> This is hard to explain, but intuitively, each "RemoveField" is paired
>with an "AddField" or "CreateModel" that *does *depend on the related
>model. So if we have a potentially dangerous optimization, those initial
>operations will "protect" the causal order, not "RemoveField".
>
> Raphael
>
>--
>You received this message because you are subscribed to the Google Groups "Django developers  (Contributions to Django itself)" group.
>To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
>To post to this group, send email to [hidden email].
>Visit this group at https://groups.google.com/group/django-developers.
>To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/9dfdcec6-b98c-44f2-86af-99aaa8857cc9%40googlegroups.com.
>For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/django-developers/YMbYXiZgrF0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/1805b163-4d47-4ace-b9d3-79b6346126cd%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CAEVNYfiq12akT6ZYd_SbbRhgpZuJLR3o3rUhsRT4qK4dH76p8Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Some thoughts about improving migration squashing

raphael
To clarify on my previous post, if we're in the first case, then the +M and -F operations can be optimized in one path to remove the dependency, and then the +M' and -M' operation can be optimized through.

In the second case (with an AddField operation), the +F and -F operations will cancel each other out (since they will be "in between" +M' and -M') and then the optimization can happen.

A bit more holistically, since M's field depends on M' in the code (which is where these operations come from, after all), in order to remove M', you would first need to remove M's field, so you're almost guaranteed that a RemoveField will be "in between" any of its dependency's creations/removals.

If we have the field information in the RemoveField, we could check it. But if we do not make an assumption on (no field information) RemoveFields, it blocks a lot of possible optimizations. We could just do that (start generating RemoveField with field info), though it would not allow for older migrations to get optimized. I'd be good with doing either/both. 

Raphael

On Fri, Feb 17, 2017 at 3:09 PM, Gaschignard, Raphael <[hidden email]> wrote:
Hi Simon,

   I think it's a bit more general than that. Why does the `RemoveField` exist? Because somewhere, an `AddField`-esque operation exists before it, right?

  Let's say we have m, m' as models.

  Let -F be a RemoveField(m, 'foo', ForeignKey(m') operation. We also have two operations +M', a CreateModel(m'), and -M', a RemoveModel(m') operation. Because we have a removeField operation, we also have +M, a "CreateModel(m), somewhere

 Let's assume that we have an operations list like:   [ ...(1)..., +M', ...(2)..., -F, ...(3)..., -M']. How do we know that we can reduce +M' and -M' together "through" -F?

Because we have -F, we have +M somewhere. because m has a field for m', either: 
   1 - The field is within the initial +M operation. Because the definition depends on M', it must be in group (2).
   2 - The field is not within the initial +M operation. Because the definition of the field depends on M', we need an AddField operation (or moral equivalent) +F in group (2)

So, if we have -F in between +M' and -M', we will also have either a +M which depends on M' or a +F which depends on M' within (2).

So if the -F is between the two, there will also be another operation that will be present between the two that expresses the same dependencies. So the -F operation itself can avoid expressing its "RemoveField" indirect dependency to m' because another operation will do it for them.

----

It's hard to generalize this across everything because there are, after all, arbitrary migrations. But considering that RemoveField will only be reduced with a "moral equivalent" to AddField, I think we can expand this reasoning across everything to say that RemoveField will be sufficiently protected by the AddField operation's location (which will have to be after +M').


Raphael


On Fri, Feb 17, 2017 at 12:28 PM, charettes <[hidden email]> wrote:
> RemoveField('A', 'foo') also references A and foo, but does it reference B? if it does, then it' s hard to have optimizations that pass through this, because this field could be referencing any model (theoretically).

I think we all agree on that.

> But if we assert that RemoveField doesn't refer to any models referenced to by its field, then our optimizer can take a couple more liberties.

Do you have suggestion on how we can assert that it's the case? The only way I could come up with was to make sure RemoveField has a reference to the field it's removing. e.g. It would be generated in the form `RemoveField('A', 'foo', ForeignKey('B'))`.

Simon


Le jeudi 16 février 2017 18:25:16 UTC-5, [hidden email] a écrit :
When you have AddField('A', 'foo', ForeignKey('B')), this operation references A and foo, but also references B. 

RemoveField('A', 'foo') also references A and foo, but does it reference B? if it does, then it' s hard to have optimizations that pass through this, because this field could be referencing any model (theoretically).

But if we assert that RemoveField doesn't refer to any models referenced to by its field, then our optimizer can take a couple more liberties.

Raphael

On Friday, February 17, 2017 at 2:15:47 AM UTC+9, Markus Holtermann wrote:
I'm not sure if it's related or not wo what you're investigating,
RemoveField cannot "just" optimized through, as you might have another
AddField operation afterwards adding another field with the same name.

/Markus

On Thu, Feb 16, 2017 at 08:19:01AM -0800, [hidden email] wrote:

>Hey Simon,
>
> I looked through your PR and added a couple comments. The main thing is I
>think we can actually ignore the field context on "RemoveField", if only
>because the executor doesn't need it. Even though the field might be
>pointing to a related model, that doesn't prevent being optimized through.
>
> This is hard to explain, but intuitively, each "RemoveField" is paired
>with an "AddField" or "CreateModel" that *does *depend on the related
>model. So if we have a potentially dangerous optimization, those initial
>operations will "protect" the causal order, not "RemoveField".
>
> Raphael
>
>--
>You received this message because you are subscribed to the Google Groups "Django developers  (Contributions to Django itself)" group.
>To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
>To post to this group, send email to [hidden email].
>Visit this group at https://groups.google.com/group/django-developers.
>To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/9dfdcec6-b98c-44f2-86af-99aaa8857cc9%40googlegroups.com.
>For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/django-developers/YMbYXiZgrF0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/1805b163-4d47-4ace-b9d3-79b6346126cd%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.


--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CAEVNYfhFQLS%3DOLXjOYku8coi%3DiR%3DUyAzS9JpJF4Uf1sH%2BUc_Zg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Some thoughts about improving migration squashing

Simon Charette
Hello Raphael,

Thanks for your detailed explanation! You clearly expressed why it's safe to
optimize through RemoveField operations and helped me lift any doubt about
what was a wrong assumption[1].

I gave your two passes optimization strategy a try and I believe I managed to
implement it correctly[2].

The plan is to modify the optimization algorithm to start by trying to perform
a "right" (as right of the migration in between) reduction and attempt a "left"
one if a subsequent operation can be reduced with. The "left" reduction will
only be performed if all the operations in between can optimize through (their
reduce() method returns True) the operation we're trying to perform a reduction
with.

Cheers,
Simon

[1] https://github.com/django/django/pull/7999/commits/6d0b740cbc9ae038f9eef95dac3057dccb283e6d
[2] https://github.com/django/django/pull/7999

Le vendredi 17 février 2017 01:14:58 UTC-5, Gaschignard, Raphael a écrit :
To clarify on my previous post, if we're in the first case, then the +M and -F operations can be optimized in one path to remove the dependency, and then the +M' and -M' operation can be optimized through.

In the second case (with an AddField operation), the +F and -F operations will cancel each other out (since they will be "in between" +M' and -M') and then the optimization can happen.

A bit more holistically, since M's field depends on M' in the code (which is where these operations come from, after all), in order to remove M', you would first need to remove M's field, so you're almost guaranteed that a RemoveField will be "in between" any of its dependency's creations/removals.

If we have the field information in the RemoveField, we could check it. But if we do not make an assumption on (no field information) RemoveFields, it blocks a lot of possible optimizations. We could just do that (start generating RemoveField with field info), though it would not allow for older migrations to get optimized. I'd be good with doing either/both. 

Raphael

On Fri, Feb 17, 2017 at 3:09 PM, Gaschignard, Raphael <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="unqrxTB_BwAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">rap...@...> wrote:
Hi Simon,

   I think it's a bit more general than that. Why does the `RemoveField` exist? Because somewhere, an `AddField`-esque operation exists before it, right?

  Let's say we have m, m' as models.

  Let -F be a RemoveField(m, 'foo', ForeignKey(m') operation. We also have two operations +M', a CreateModel(m'), and -M', a RemoveModel(m') operation. Because we have a removeField operation, we also have +M, a "CreateModel(m), somewhere

 Let's assume that we have an operations list like:   [ ...(1)..., +M', ...(2)..., -F, ...(3)..., -M']. How do we know that we can reduce +M' and -M' together "through" -F?

Because we have -F, we have +M somewhere. because m has a field for m', either: 
   1 - The field is within the initial +M operation. Because the definition depends on M', it must be in group (2).
   2 - The field is not within the initial +M operation. Because the definition of the field depends on M', we need an AddField operation (or moral equivalent) +F in group (2)

So, if we have -F in between +M' and -M', we will also have either a +M which depends on M' or a +F which depends on M' within (2).

So if the -F is between the two, there will also be another operation that will be present between the two that expresses the same dependencies. So the -F operation itself can avoid expressing its "RemoveField" indirect dependency to m' because another operation will do it for them.

----

It's hard to generalize this across everything because there are, after all, arbitrary migrations. But considering that RemoveField will only be reduced with a "moral equivalent" to AddField, I think we can expand this reasoning across everything to say that RemoveField will be sufficiently protected by the AddField operation's location (which will have to be after +M').


Raphael


On Fri, Feb 17, 2017 at 12:28 PM, charettes <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="unqrxTB_BwAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">chare...@...> wrote:
> RemoveField('A', 'foo') also references A and foo, but does it reference B? if it does, then it' s hard to have optimizations that pass through this, because this field could be referencing any model (theoretically).

I think we all agree on that.

> But if we assert that RemoveField doesn't refer to any models referenced to by its field, then our optimizer can take a couple more liberties.

Do you have suggestion on how we can assert that it's the case? The only way I could come up with was to make sure RemoveField has a reference to the field it's removing. e.g. It would be generated in the form `RemoveField('A', 'foo', ForeignKey('B'))`.

Simon


Le jeudi 16 février 2017 18:25:16 UTC-5, [hidden email] a écrit :
When you have AddField('A', 'foo', ForeignKey('B')), this operation references A and foo, but also references B. 

RemoveField('A', 'foo') also references A and foo, but does it reference B? if it does, then it' s hard to have optimizations that pass through this, because this field could be referencing any model (theoretically).

But if we assert that RemoveField doesn't refer to any models referenced to by its field, then our optimizer can take a couple more liberties.

Raphael

On Friday, February 17, 2017 at 2:15:47 AM UTC+9, Markus Holtermann wrote:
I'm not sure if it's related or not wo what you're investigating,
RemoveField cannot "just" optimized through, as you might have another
AddField operation afterwards adding another field with the same name.

/Markus

On Thu, Feb 16, 2017 at 08:19:01AM -0800, [hidden email] wrote:

>Hey Simon,
>
> I looked through your PR and added a couple comments. The main thing is I
>think we can actually ignore the field context on "RemoveField", if only
>because the executor doesn't need it. Even though the field might be
>pointing to a related model, that doesn't prevent being optimized through.
>
> This is hard to explain, but intuitively, each "RemoveField" is paired
>with an "AddField" or "CreateModel" that *does *depend on the related
>model. So if we have a potentially dangerous optimization, those initial
>operations will "protect" the causal order, not "RemoveField".
>
> Raphael
>
>--
>You received this message because you are subscribed to the Google Groups "Django developers  (Contributions to Django itself)" group.
>To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
>To post to this group, send email to [hidden email].
>Visit this group at <a href="https://groups.google.com/group/django-developers" rel="nofollow" target="_blank" onmousedown="this.href=&#39;https://groups.google.com/group/django-developers&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/group/django-developers&#39;;return true;">https://groups.google.com/group/django-developers.
>To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/django-developers/9dfdcec6-b98c-44f2-86af-99aaa8857cc9%40googlegroups.com" rel="nofollow" target="_blank" onmousedown="this.href=&#39;https://groups.google.com/d/msgid/django-developers/9dfdcec6-b98c-44f2-86af-99aaa8857cc9%40googlegroups.com&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/msgid/django-developers/9dfdcec6-b98c-44f2-86af-99aaa8857cc9%40googlegroups.com&#39;;return true;">https://groups.google.com/d/msgid/django-developers/9dfdcec6-b98c-44f2-86af-99aaa8857cc9%40googlegroups.com.
>For more options, visit <a href="https://groups.google.com/d/optout" rel="nofollow" target="_blank" onmousedown="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;">https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this topic, visit <a href="https://groups.google.com/d/topic/django-developers/YMbYXiZgrF0/unsubscribe" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://groups.google.com/d/topic/django-developers/YMbYXiZgrF0/unsubscribe&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/topic/django-developers/YMbYXiZgrF0/unsubscribe&#39;;return true;">https://groups.google.com/d/topic/django-developers/YMbYXiZgrF0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="unqrxTB_BwAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">django-develop...@googlegroups.com.
To post to this group, send email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="unqrxTB_BwAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">django-d...@googlegroups.com.
Visit this group at <a href="https://groups.google.com/group/django-developers" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://groups.google.com/group/django-developers&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/group/django-developers&#39;;return true;">https://groups.google.com/group/django-developers.
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/django-developers/1805b163-4d47-4ace-b9d3-79b6346126cd%40googlegroups.com?utm_medium=email&amp;utm_source=footer" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://groups.google.com/d/msgid/django-developers/1805b163-4d47-4ace-b9d3-79b6346126cd%40googlegroups.com?utm_medium\x3demail\x26utm_source\x3dfooter&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/msgid/django-developers/1805b163-4d47-4ace-b9d3-79b6346126cd%40googlegroups.com?utm_medium\x3demail\x26utm_source\x3dfooter&#39;;return true;">https://groups.google.com/d/msgid/django-developers/1805b163-4d47-4ace-b9d3-79b6346126cd%40googlegroups.com.

For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;">https://groups.google.com/d/optout.


--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/30f9183f-3576-47b8-bc54-2ae007f304c3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Some thoughts about improving migration squashing

Markus Holtermann
In reply to this post by raphael
On Thu, Feb 16, 2017 at 03:25:16PM -0800, [hidden email] wrote:
>When you have AddField('A', 'foo', ForeignKey('B')), this operation
>references A and foo, but also references B.

correct.

>RemoveField('A', 'foo') also references A and foo, but does it reference B?
>if it does, then it' s hard to have optimizations that pass through this,
>because this field could be referencing any model (theoretically).

No, that field can not reference any model. It reference exactly one
model (that even holds for FK to abstract models as fields from abstract
models are inlined in the concrete models in migrations). However,
RemoveField doesn't have the information to "just" figure out the
referenced model. RemoveField would need to look into the from_state's
apps and actually even look into the actual field that's referred.
ForeignKeys have an implicit to_field attribute, so, having

  AddField('A', 'foo', ForeignKey('B', to_field='bar'))

a

  RemoveField('A', 'foo')

references exactly one field on one particular model. Not more and not
less. The issue here is that RemoveField needs to take that information
from the state and not from one of its attributes.

/Markus

>
>But if we assert that RemoveField doesn't refer to any models referenced to
>by its field, then our optimizer can take a couple more liberties.
>
>Raphael
>
>On Friday, February 17, 2017 at 2:15:47 AM UTC+9, Markus Holtermann wrote:
>>
>> I'm not sure if it's related or not wo what you're investigating,
>> RemoveField cannot "just" optimized through, as you might have another
>> AddField operation afterwards adding another field with the same name.
>>
>> /Markus
>>
>> On Thu, Feb 16, 2017 at 08:19:01AM -0800, [hidden email]
>> <javascript:> wrote:
>> >Hey Simon,
>> >
>> > I looked through your PR and added a couple comments. The main thing is
>> I
>> >think we can actually ignore the field context on "RemoveField", if only
>> >because the executor doesn't need it. Even though the field might be
>> >pointing to a related model, that doesn't prevent being optimized
>> through.
>> >
>> > This is hard to explain, but intuitively, each "RemoveField" is paired
>> >with an "AddField" or "CreateModel" that *does *depend on the related
>> >model. So if we have a potentially dangerous optimization, those initial
>> >operations will "protect" the causal order, not "RemoveField".
>> >
>> > Raphael
>> >
>> >--
>> >You received this message because you are subscribed to the Google Groups
>> "Django developers  (Contributions to Django itself)" group.
>> >To unsubscribe from this group and stop receiving emails from it, send an
>> email to [hidden email] <javascript:>.
>> >To post to this group, send email to [hidden email]
>> <javascript:>.
>> >Visit this group at https://groups.google.com/group/django-developers.
>> >To view this discussion on the web visit
>> https://groups.google.com/d/msgid/django-developers/9dfdcec6-b98c-44f2-86af-99aaa8857cc9%40googlegroups.com.
>>
>> >For more options, visit https://groups.google.com/d/optout.
>>
>>
>
>--
>You received this message because you are subscribed to the Google Groups "Django developers  (Contributions to Django itself)" group.
>To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
>To post to this group, send email to [hidden email].
>Visit this group at https://groups.google.com/group/django-developers.
>To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/7515c909-a015-451d-bdaa-f040e6322166%40googlegroups.com.
>For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/20170220104923.GA17288%40inel.local.
For more options, visit https://groups.google.com/d/optout.

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Some thoughts about improving migration squashing

Patryk Zawadzki-2
W dniu poniedziałek, 20 lutego 2017 11:49:38 UTC+1 użytkownik Markus Holtermann napisał:
On Thu, Feb 16, 2017 at 03:25:16PM -0800, <a href="javascript:" target="_blank" gdf-obfuscated-mailto="BT4bdet5CAAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">rap...@... wrote:
>RemoveField('A', 'foo') also references A and foo, but does it reference B?
>if it does, then it' s hard to have optimizations that pass through this,
>because this field could be referencing any model (theoretically).

No, that field can not reference any model. It reference exactly one
model (that even holds for FK to abstract models as fields from abstract
models are inlined in the concrete models in migrations). However,
RemoveField doesn't have the information to "just" figure out the
referenced model. RemoveField would need to look into the from_state's
apps and actually even look into the actual field that's referred.
ForeignKeys have an implicit to_field attribute, so, having

  AddField('A', 'foo', ForeignKey('B', to_field='bar'))

a

  RemoveField('A', 'foo')

references exactly one field on one particular model. Not more and not
less. The issue here is that RemoveField needs to take that information
from the state and not from one of its attributes.

Technically it references _some_ model named "B" that was created not sooner than the current migration's explicit dependencies. It may be the model you saw when you created that migration or it may be some other model. You can tell your migration that it has to be executed no sooner than after another migration is complete but there is no way to say "but before model B is modified any further".

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/b9964408-8dff-49ec-aa08-89826d35503c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Loading...