How to handle a migration that last too long for being deployed?

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

How to handle a migration that last too long for being deployed?

Adrien Cossa

Hi everybody!

I would like to know what options exist when you have a huge migration that will obviously not run on your productive server.

I have spitted a model in two smaller ones and wrote then a migration to populate these new models. The number of original objects is around 250,000 and I have also a few references to update. In the end, the migration lasted more than 30 mn on my machine (16 GB RAM and it was swapping a lot) and it failed on another machine because the RAM was out (the process was using then about 13 GB). On the productive server we have even less RAM so to run the migration as it is is really out of question.

I have tried to use all the Django mechanisms that I know to optimize the queries: select_related, prefetch_related, bulk_create, QuerySet.update... Now, the migration I am talking about use bulk_create(batch_size=None) and process the whole queryset at once. Before that, as the migration was not so long lasting because I had 2 references less to update,  I tried other values for batch_size and also I processed the queryset as pages of a few hundreds or thousands objects. The results were not better then batch_size=None and "all at once", that's why I finally used "basic settings" (and the migration was lasting about 5 minutes). I will have to reintroduce some tweaks because the extra updates of the two relations I mentioned is making here a big difference.

I am wondering if someone already found him/herself in a similar situation, and with what solution you finally came to.

If the migration lasts very long, it's not a problem by itself but I don't want to lock the database for 15 mn. The fact is that I don't know what is happening during the migration process, what is locked by what? I will split the migration in "pages" to use less RAM anyway, but I was also thinking of migrating in two different steps *or* files, in order to process separately the objects that are not editable (basically most of them, that we keep for history but they are read-only) and the others (which should be much faster and thus people working will not be blocked for long). Does it make sense? Some other ideas?

Thanks a lot!

Adrien

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/ca0c145a-8147-9bbd-e01e-e74355a16a2a%40init.at.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: How to handle a migration that last too long for being deployed?

Mike Dewhirst-3
Does it actually stop users reading? If the entire migration happens in a single transaction, the database (Postgres anyway) should remain accessible until the moment it is committed.

Maybe you could announce a maintenance operation which will only interrupt certain actions for a few minutes?

Mike

Connected by Motorola


Adrien Cossa <[hidden email]> wrote:

Hi everybody!

I would like to know what options exist when you have a huge migration that will obviously not run on your productive server.

I have spitted a model in two smaller ones and wrote then a migration to populate these new models. The number of original objects is around 250,000 and I have also a few references to update. In the end, the migration lasted more than 30 mn on my machine (16 GB RAM and it was swapping a lot) and it failed on another machine because the RAM was out (the process was using then about 13 GB). On the productive server we have even less RAM so to run the migration as it is is really out of question.

I have tried to use all the Django mechanisms that I know to optimize the queries: select_related, prefetch_related, bulk_create, QuerySet.update... Now, the migration I am talking about use bulk_create(batch_size=None) and process the whole queryset at once. Before that, as the migration was not so long lasting because I had 2 references less to update,  I tried other values for batch_size and also I processed the queryset as pages of a few hundreds or thousands objects. The results were not better then batch_size=None and "all at once", that's why I finally used "basic settings" (and the migration was lasting about 5 minutes). I will have to reintroduce some tweaks because the extra updates of the two relations I mentioned is making here a big difference.

I am wondering if someone already found him/herself in a similar situation, and with what solution you finally came to.

If the migration lasts very long, it's not a problem by itself but I don't want to lock the database for 15 mn. The fact is that I don't know what is happening during the migration process, what is locked by what? I will split the migration in "pages" to use less RAM anyway, but I was also thinking of migrating in two different steps *or* files, in order to process separately the objects that are not editable (basically most of them, that we keep for history but they are read-only) and the others (which should be much faster and thus people working will not be blocked for long). Does it make sense? Some other ideas?

Thanks a lot!

Adrien

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/f8e3qpjpuc9hm8k6oyp6fvi8.1523666325014%40email.android.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: How to handle a migration that last too long for being deployed?

Mike Dewhirst-3
In reply to this post by Adrien Cossa
Adrien

May I start a new thread to discuss with you the costs and benefits for
splitting your model. I have some big models (47 up to 76 columns) which
I have long thought are just too big. The splits would all have to be
1:1 with the core model.

Mike

On 13/04/2018 7:05 PM, Adrien Cossa wrote:

>
> Hi everybody!
>
> I would like to know what options exist when you have a huge migration
> that will obviously not run on your productive server.
>
> I have spitted a model in two smaller ones and wrote then a migration
> to populate these new models. The number of original objects is around
> 250,000 and I have also a few references to update. In the end, the
> migration lasted more than 30 mn on my machine (16 GB RAM and it was
> swapping a lot) and it failed on another machine because the RAM was
> out (the process was using then about 13 GB). On the productive server
> we have even less RAM so to run the migration as it is is really out
> of question.
>
> I have tried to use all the Django mechanisms that I know to optimize
> the queries: select_related, prefetch_related, bulk_create,
> QuerySet.update... Now, the migration I am talking about use
> bulk_create(batch_size=None) and process the whole queryset at once.
> Before that, as the migration was not so long lasting because I had 2
> references less to update,  I tried other values for batch_size and
> also I processed the queryset as pages of a few hundreds or thousands
> objects. The results were not better then batch_size=None and "all at
> once", that's why I finally used "basic settings" (and the migration
> was lasting about 5 minutes). I will have to reintroduce some tweaks
> because the extra updates of the two relations I mentioned is making
> here a big difference.
>
> I am wondering if someone already found him/herself in a similar
> situation, and with what solution you finally came to.
>
> If the migration lasts very long, it's not a problem by itself but I
> don't want to lock the database for 15 mn. The fact is that I don't
> know what is happening during the migration process, what is locked by
> what? I will split the migration in "pages" to use less RAM anyway,
> but I was also thinking of migrating in two different steps *or*
> files, in order to process separately the objects that are not
> editable (basically most of them, that we keep for history but they
> are read-only) and the others (which should be much faster and thus
> people working will not be blocked for long). Does it make sense? Some
> other ideas?
>
> Thanks a lot!
>
> Adrien
> --
> You received this message because you are subscribed to the Google
> Groups "Django users" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to [hidden email]
> <mailto:[hidden email]>.
> To post to this group, send email to [hidden email]
> <mailto:[hidden email]>.
> Visit this group at https://groups.google.com/group/django-users.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-users/ca0c145a-8147-9bbd-e01e-e74355a16a2a%40init.at 
> <https://groups.google.com/d/msgid/django-users/ca0c145a-8147-9bbd-e01e-e74355a16a2a%40init.at?utm_medium=email&utm_source=footer>.
> For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/5236eda5-6a49-e3fe-5af0-f29eb327af97%40dewhirst.com.au.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: How to handle a migration that last too long for being deployed?

samuel muiruri
In reply to this post by Adrien Cossa
Maybe not adequate but you could dump_data from a model and load_data and not have migrate load data also at the same time.

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/bcb3a484-e495-4324-af7a-5ab2d54534ac%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: How to handle a migration that last too long for being deployed?

Adrien Cossa
In reply to this post by Mike Dewhirst-3
Hi,

On 04/14/2018 02:38 AM, Mike Dewhirst wrote:
Does it actually stop users reading? If the entire migration happens in a single transaction, the database (Postgres anyway) should remain accessible until the moment it is committed.

Maybe you could announce a maintenance operation which will only interrupt certain actions for a few minutes?

I am not sure if I understand well how that works. If the migration is atomic, is is true that:
- the users can read normally between the beginning of the migration and the beginning of the transaction commit, and they will get the old data
- the users trying to read during the transaction commit will have to wait for it to finish, and they will get the migrated data
- the users who tries to write at anytime between the beginning of the migration and the end of the transaction commit will have to wait for it to finish, and they might overwrite the migrated data

If it works like that, I think that the solution I was thinking about is good enough for my needs:

- process the queryset by chunks. Note that if you want to use prefetch_related, you can't use a queryset iterator. So I get the successive chunks by filtering with PK ranges ... I have benchmarked a bit and a good value for the chunk size seems to be 500, it's not slower then any other value and keeps the memory usage down. If I had a lot of RAM, I could also raise this value to 5000 or 15000 without really slowing the process.

- I have actually two separate atomic migrations (in case something goes wrong, it is better then one non atomic migration with two atomic operations): one to process the objects that are not modifiable by the users (because they are in a certain status etc.) and a second to process the objects that could be modified by the users. The second migration concerns only 4-5% of the total objects, so it should be much faster. As I use the PKs to fetch the objects, I have to fix the desired chunk size of 500 in order to get some chunks of (approx.) the same size with 500 * total_objects_count / filtered_queryset_count.

There remains the problem of users that would try to write during the second migration: their changes will be written indeed to the old model, but not taken in account by the new models (remember I want to split one model in two smaller ones). So maybe I should append here to the second migration all the operations that are responsible for deleting the old model? This way, people trying to write will get an error - which is the best we can do here. Am I right?

Thanks for your help!

Cheers,
Adrien

Adrien Cossa [hidden email] wrote:

Hi everybody!

I would like to know what options exist when you have a huge migration that will obviously not run on your productive server.

I have spitted a model in two smaller ones and wrote then a migration to populate these new models. The number of original objects is around 250,000 and I have also a few references to update. In the end, the migration lasted more than 30 mn on my machine (16 GB RAM and it was swapping a lot) and it failed on another machine because the RAM was out (the process was using then about 13 GB). On the productive server we have even less RAM so to run the migration as it is is really out of question.

I have tried to use all the Django mechanisms that I know to optimize the queries: select_related, prefetch_related, bulk_create, QuerySet.update... Now, the migration I am talking about use bulk_create(batch_size=None) and process the whole queryset at once. Before that, as the migration was not so long lasting because I had 2 references less to update,  I tried other values for batch_size and also I processed the queryset as pages of a few hundreds or thousands objects. The results were not better then batch_size=None and "all at once", that's why I finally used "basic settings" (and the migration was lasting about 5 minutes). I will have to reintroduce some tweaks because the extra updates of the two relations I mentioned is making here a big difference.

I am wondering if someone already found him/herself in a similar situation, and with what solution you finally came to.

If the migration lasts very long, it's not a problem by itself but I don't want to lock the database for 15 mn. The fact is that I don't know what is happening during the migration process, what is locked by what? I will split the migration in "pages" to use less RAM anyway, but I was also thinking of migrating in two different steps *or* files, in order to process separately the objects that are not editable (basically most of them, that we keep for history but they are read-only) and the others (which should be much faster and thus people working will not be blocked for long). Does it make sense? Some other ideas?

Thanks a lot!

Adrien

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/7043dd61-1c9f-0da2-db06-ae270f69a58c%40init.at.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: How to handle a migration that last too long for being deployed?

Mike Dewhirst-3
On 17/04/2018 6:27 PM, Adrien Cossa wrote:

> Hi,
>
> On 04/14/2018 02:38 AM, Mike Dewhirst wrote:
>> Does it actually stop users reading? If the entire migration happens
>> in a single transaction, the database (Postgres anyway) should remain
>> accessible until the moment it is committed.
>>
>> Maybe you could announce a maintenance operation which will only
>> interrupt certain actions for a few minutes?
>
> I am not sure if I understand well how that works. If the migration is
> atomic, is is true that:
> - the users can read normally between the beginning of the migration
> and the beginning of the transaction commit, and they will get the old
> data
> - the users trying to read during the transaction commit will have to
> wait for it to finish, and they will get the migrated data
> - the users who tries to write at anytime between the beginning of the
> migration and the end of the transaction commit will have to wait for
> it to finish, and they might overwrite the migrated data
>
> If it works like that, I think that the solution I was thinking about
> is good enough for my needs:
>
> - process the queryset by chunks. Note that if you want to use
> prefetch_related, you can't use a queryset iterator. So I get the
> successive chunks by filtering with PK ranges ... I have benchmarked a
> bit and a good value for the chunk size seems to be 500, it's not
> slower then any other value and keeps the memory usage down. If I had
> a lot of RAM, I could also raise this value to 5000 or 15000 without
> really slowing the process.
>
> - I have actually two separate atomic migrations (in case something
> goes wrong, it is better then one non atomic migration with two atomic
> operations): one to process the objects that are not modifiable by the
> users (because they are in a certain status etc.) and a second to
> process the objects that could be modified by the users. The second
> migration concerns only 4-5% of the total objects, so it should be
> much faster. As I use the PKs to fetch the objects, I have to fix the
> desired chunk size of 500 in order to get some chunks of (approx.) the
> same size with 500 * total_objects_count / filtered_queryset_count.
>
> There remains the problem of users that would try to write during the
> second migration: their changes will be written indeed to the old
> model, but not taken in account by the new models (remember I want to
> split one model in two smaller ones). So maybe I should append here to
> the second migration all the operations that are responsible for
> deleting the old model? This way, people trying to write will get an
> error - which is the best we can do here. Am I right?

It would be nice to avoid errors. That is why I suggested announcing
that you intend to take the system offline for a short period. It takes
off all the pressure and you can choose the simplest mechanism.

Users will get a benefit from the migration or you wouldn't be doing it.
Therefore they should be happy to accept a little downtime. You might
have to do a bit of selling :)

I might consider making production readonly, dumping the database,
loading it up on a fast machine with heaps of RAM and a SSD for the
migration then dumping and reloading on the production machine.

That way you can leave it online read-only and take it offline only for
the relatively brief reload after the off-site migration. A bit of
practice and timing will indicate whether that method has legs. Or wings!

>
> Thanks for your help!
>
> Cheers,
> Adrien
>
>> Adrien Cossa <[hidden email]> wrote:
>>
>> Hi everybody!
>>
>> I would like to know what options exist when you have a huge
>> migration that will obviously not run on your productive server.
>>
>> I have spitted a model in two smaller ones and wrote then a migration
>> to populate these new models. The number of original objects is
>> around 250,000 and I have also a few references to update. In the
>> end, the migration lasted more than 30 mn on my machine (16 GB RAM
>> and it was swapping a lot) and it failed on another machine because
>> the RAM was out (the process was using then about 13 GB). On the
>> productive server we have even less RAM so to run the migration as it
>> is is really out of question.
>>
>> I have tried to use all the Django mechanisms that I know to optimize
>> the queries: select_related, prefetch_related, bulk_create,
>> QuerySet.update... Now, the migration I am talking about use
>> bulk_create(batch_size=None) and process the whole queryset at once.
>> Before that, as the migration was not so long lasting because I had 2
>> references less to update,  I tried other values for batch_size and
>> also I processed the queryset as pages of a few hundreds or thousands
>> objects. The results were not better then batch_size=None and "all at
>> once", that's why I finally used "basic settings" (and the migration
>> was lasting about 5 minutes). I will have to reintroduce some tweaks
>> because the extra updates of the two relations I mentioned is making
>> here a big difference.
>>
>> I am wondering if someone already found him/herself in a similar
>> situation, and with what solution you finally came to.
>>
>> If the migration lasts very long, it's not a problem by itself but I
>> don't want to lock the database for 15 mn. The fact is that I don't
>> know what is happening during the migration process, what is locked
>> by what? I will split the migration in "pages" to use less RAM
>> anyway, but I was also thinking of migrating in two different steps
>> *or* files, in order to process separately the objects that are not
>> editable (basically most of them, that we keep for history but they
>> are read-only) and the others (which should be much faster and thus
>> people working will not be blocked for long). Does it make sense?
>> Some other ideas?
>>
>> Thanks a lot!
>>
>> Adrien
>
> --
> You received this message because you are subscribed to the Google
> Groups "Django users" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to [hidden email]
> <mailto:[hidden email]>.
> To post to this group, send email to [hidden email]
> <mailto:[hidden email]>.
> Visit this group at https://groups.google.com/group/django-users.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-users/7043dd61-1c9f-0da2-db06-ae270f69a58c%40init.at 
> <https://groups.google.com/d/msgid/django-users/7043dd61-1c9f-0da2-db06-ae270f69a58c%40init.at?utm_medium=email&utm_source=footer>.
> For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/15d57b6d-d435-cfb6-3012-95e64499fb56%40dewhirst.com.au.
For more options, visit https://groups.google.com/d/optout.