Commitable json dumps

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Commitable json dumps

Brice PARENT-2
Hello,

I've had a few customers for whom I've had to create a repository with
all their almost-static data (pages contents, etc.). To do that, when
they want a backup, a scripts calls several "manage.py dumpdata
--indent=4 [app].[model] > app_model.json", then commit the whole thing.
The customer may then update their version of the repo.

But whenever there are some changes, I'd like to be able to see them
easily (that's the reason of the --indent), but right now, the fields
order changes frequently as the order has no meaning. But in a diff, the
order changes everything. It's almost impossible to see the changes
because every line has moved.

I have no idea if this should be an argument to dumpdata, or a special
behaviour on the serializer's side, but having the fields sorted during
the serialization doesn't change the validity of the data, but allows
the diffs to be way more explicit.

How it can be done for json's serializaer
(django/core/serializers/json.py:60 for django 1.8):
json.dump(self.get_dump_object(obj), self.stream, cls=DjangoJSONEncoder,
sort_keys=True, **self.json_kwargs)

(I added the sort_keys=True argument)

I haven't looked if it would have an equivalent for other serializers,
nor if it would make any sense without the "indent" argument, for now
it's just an idea that feels good, but probably require more thinking
and advice before being investigated more deeply. And I didn't launch
any test suite for now, so I don't know if there is any side effect.  
Just validating the idea here.

Any thoughts?

Brice Parent



--
You received this message because you are subscribed to the Google Groups "Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/818b56b8-716d-d80f-ade2-1f3424206b08%40brice.xyz.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Commitable json dumps

Adam Johnson-2
PyYAML sorts keys by default, so if you use the YAML serializer that should work for your usecase.

I think patching the JSON serializer to be deterministic by default is a good idea, the performance cost of sorting keys is pretty small compared to disk operations.

On 6 March 2017 at 11:32, Brice PARENT <[hidden email]> wrote:
Hello,

I've had a few customers for whom I've had to create a repository with all their almost-static data (pages contents, etc.). To do that, when they want a backup, a scripts calls several "manage.py dumpdata --indent=4 [app].[model] > app_model.json", then commit the whole thing. The customer may then update their version of the repo.

But whenever there are some changes, I'd like to be able to see them easily (that's the reason of the --indent), but right now, the fields order changes frequently as the order has no meaning. But in a diff, the order changes everything. It's almost impossible to see the changes because every line has moved.

I have no idea if this should be an argument to dumpdata, or a special behaviour on the serializer's side, but having the fields sorted during the serialization doesn't change the validity of the data, but allows the diffs to be way more explicit.

How it can be done for json's serializaer (django/core/serializers/json.py:60 for django 1.8):
json.dump(self.get_dump_object(obj), self.stream, cls=DjangoJSONEncoder, sort_keys=True, **self.json_kwargs)

(I added the sort_keys=True argument)

I haven't looked if it would have an equivalent for other serializers, nor if it would make any sense without the "indent" argument, for now it's just an idea that feels good, but probably require more thinking and advice before being investigated more deeply. And I didn't launch any test suite for now, so I don't know if there is any side effect.  Just validating the idea here.

Any thoughts?

Brice Parent



--
You received this message because you are subscribed to the Google Groups "Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/818b56b8-716d-d80f-ade2-1f3424206b08%40brice.xyz.
For more options, visit https://groups.google.com/d/optout.



--
Adam

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CAMyDDM3AbkFeab5R0RoK63nNnbgLyN-6cHCT_eJC9SET4V6_6w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Commitable json dumps

Brice PARENT-2
Le 06/03/17 à 14:28, Adam Johnson a écrit :
> PyYAML sorts keys by default, so if you use the YAML serializer that
> should work for your usecase.
I think it will become my new default !
Thanks for the info.

Brice

--
You received this message because you are subscribed to the Google Groups "Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/5fd0c320-5f74-94dd-1805-c66b70bdd576%40brice.xyz.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Commitable json dumps

Brice PARENT-2
In reply to this post by Adam Johnson-2

Le 06/03/17 à 13:28, James Pic a écrit :
>
> Django-dbdiff solved that serialization issue, specifically to create
> diff outputs, in earlier versions. Now it has its own diff engine
> built in though, definitely worth taking a look.
>
I will ! But for my use case, it appears that Yaml would be a better
idea, as it should already work with a stock Django.
Thanks !

Brice

--
You received this message because you are subscribed to the Google Groups "Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/bf2e29fb-f2f1-d40e-779f-5412fabdfbc0%40brice.xyz.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Commitable json dumps

Brice PARENT-2
In reply to this post by Brice PARENT-2

Le 06/03/17 à 15:44, Brice PARENT a écrit :
> Le 06/03/17 à 14:28, Adam Johnson a écrit :
>> PyYAML sorts keys by default, so if you use the YAML serializer that
>> should work for your usecase.
> I think it will become my new default !
It appears that the rendered format is not very consistent, or at least
it's what I've found. Yaml seems to offer a short and a long syntax.
I tried with 2 models, one from stock Django (flatpages), which seems to
give something that corresponds to my needs, and one with a custom one,
where the used syntax doesn't create a new line for each field. (I
edited the outputs to focus on the idea and remove irrelevant contents).
./manage.py dumpdata --format yaml --indent 4 flatpages
-   fields:
         content: '<p>First line.</p>
             <p>Second line</p>'
         enable_comments: false
         registration_required: false
         sites: [1]
         template_name: ''
         title: Multiline
         url: /my/test/
     model: flatpages.flatpage
     pk: 13

./manage.py dumpdata --format yaml --indent 4 myapp
-   fields: {content: "<p>First line.</p>\r\n\r\n<h3>Second line</h3>\r\
             \n", module: 1, position: 1, summary: "<h3>First line.</h3>\r\
             \n\r\n<h3>Second line</h3>\r\", title: "My title"}
     model: myapp.mymodel
     pk: 1

So with the same command, I've gotten two formats, one that is
git-friendly, and one that isn't. I haven't yet looked at the source
code on why it chose to use one syntax over the other though.

Brice

--
You received this message because you are subscribed to the Google Groups "Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/7056c260-d57e-e465-b087-fc1e06762402%40brice.xyz.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Commitable json dumps

Adam Johnson-2
Ah yes, PyYAML just does this. It can be disabled by passing a different option to yaml.dump (I think default_flow_style=Falsebut that would be similar to changing the JSON serializer..

On 6 March 2017 at 16:53, Brice PARENT <[hidden email]> wrote:

Le 06/03/17 à 15:44, Brice PARENT a écrit :
Le 06/03/17 à 14:28, Adam Johnson a écrit :
PyYAML sorts keys by default, so if you use the YAML serializer that should work for your usecase.
I think it will become my new default !
It appears that the rendered format is not very consistent, or at least it's what I've found. Yaml seems to offer a short and a long syntax.
I tried with 2 models, one from stock Django (flatpages), which seems to give something that corresponds to my needs, and one with a custom one, where the used syntax doesn't create a new line for each field. (I edited the outputs to focus on the idea and remove irrelevant contents).
./manage.py dumpdata --format yaml --indent 4 flatpages
-   fields:
        content: '<p>First line.</p>
            <p>Second line</p>'
        enable_comments: false
        registration_required: false
        sites: [1]
        template_name: ''
        title: Multiline
        url: /my/test/
    model: flatpages.flatpage
    pk: 13

./manage.py dumpdata --format yaml --indent 4 myapp
-   fields: {content: "<p>First line.</p>\r\n\r\n<h3>Second line</h3>\r\
            \n", module: 1, position: 1, summary: "<h3>First line.</h3>\r\
            \n\r\n<h3>Second line</h3>\r\", title: "My title"}
    model: myapp.mymodel
    pk: 1

So with the same command, I've gotten two formats, one that is git-friendly, and one that isn't. I haven't yet looked at the source code on why it chose to use one syntax over the other though.

Brice

--
You received this message because you are subscribed to the Google Groups "Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/7056c260-d57e-e465-b087-fc1e06762402%40brice.xyz.

For more options, visit https://groups.google.com/d/optout.



--
Adam

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CAMyDDM06zwCqBQPav-3c2xztWMHxck4CWaX7%3DuUByr6KWgToHA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Commitable json dumps

Adam Johnson-2
Wait, I just looked into this further, and discovered that the ordering of fields was made deterministic for all serializers in #24558 - this was released in Django 1.9! Enjoy👌

On 7 March 2017 at 22:23, Adam Johnson <[hidden email]> wrote:
Ah yes, PyYAML just does this. It can be disabled by passing a different option to yaml.dump (I think default_flow_style=Falsebut that would be similar to changing the JSON serializer..

On 6 March 2017 at 16:53, Brice PARENT <[hidden email]> wrote:

Le 06/03/17 à 15:44, Brice PARENT a écrit :
Le 06/03/17 à 14:28, Adam Johnson a écrit :
PyYAML sorts keys by default, so if you use the YAML serializer that should work for your usecase.
I think it will become my new default !
It appears that the rendered format is not very consistent, or at least it's what I've found. Yaml seems to offer a short and a long syntax.
I tried with 2 models, one from stock Django (flatpages), which seems to give something that corresponds to my needs, and one with a custom one, where the used syntax doesn't create a new line for each field. (I edited the outputs to focus on the idea and remove irrelevant contents).
./manage.py dumpdata --format yaml --indent 4 flatpages
-   fields:
        content: '<p>First line.</p>
            <p>Second line</p>'
        enable_comments: false
        registration_required: false
        sites: [1]
        template_name: ''
        title: Multiline
        url: /my/test/
    model: flatpages.flatpage
    pk: 13

./manage.py dumpdata --format yaml --indent 4 myapp
-   fields: {content: "<p>First line.</p>\r\n\r\n<h3>Second line</h3>\r\
            \n", module: 1, position: 1, summary: "<h3>First line.</h3>\r\
            \n\r\n<h3>Second line</h3>\r\", title: "My title"}
    model: myapp.mymodel
    pk: 1

So with the same command, I've gotten two formats, one that is git-friendly, and one that isn't. I haven't yet looked at the source code on why it chose to use one syntax over the other though.

Brice

--
You received this message because you are subscribed to the Google Groups "Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/7056c260-d57e-e465-b087-fc1e06762402%40brice.xyz.

For more options, visit https://groups.google.com/d/optout.



--
Adam



--
Adam

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CAMyDDM2onJFJGAFC5UGiJpmKth%3DnKeHVjYk%3Dv-GMYCrk-%3DqR_A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Commitable json dumps

Brice PARENT-2

Wait, I just looked into this further, and discovered that the ordering of fields was made deterministic for all serializers in #24558 - this was released in Django 1.9! Enjoy👌
Nice. Thanks for the info!
So I'll wait for v1.11 for that , no problem! (as a Freelancer, I only deploy LTS versions for my customers)

Brice

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/e07946cc-370a-3aa9-1829-44929fcce7b1%40brice.xyz.
For more options, visit https://groups.google.com/d/optout.
Loading...