enhancement request: make py3 read/write py2 pickle format

classic Classic list List threaded Threaded
39 messages Options
12
Reply | Threaded
Open this post in threaded view
|

enhancement request: make py3 read/write py2 pickle format

Neal Becker
One of the most annoying problems with py2/3 interoperability is that the
pickle formats are not compatible.  There must be many who, like myself,
often use pickle format for data storage.

It certainly would be a big help if py3 could read/write py2 pickle format.  
You know, backward compatibility?


Reply | Threaded
Open this post in threaded view
|

enhancement request: make py3 read/write py2 pickle format

Mark Lawrence
On 09/06/2015 19:08, Neal Becker wrote:
> One of the most annoying problems with py2/3 interoperability is that the
> pickle formats are not compatible.  There must be many who, like myself,
> often use pickle format for data storage.
>
> It certainly would be a big help if py3 could read/write py2 pickle format.
> You know, backward compatibility?
>

http://bugs.python.org/issue13566

--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence


Reply | Threaded
Open this post in threaded view
|

enhancement request: make py3 read/write py2 pickle format

Laura Creighton-2
In reply to this post by Neal Becker
In a message of Tue, 09 Jun 2015 14:08:25 -0400, Neal Becker writes:
>One of the most annoying problems with py2/3 interoperability is that the
>pickle formats are not compatible.  There must be many who, like myself,
>often use pickle format for data storage.
>
>It certainly would be a big help if py3 could read/write py2 pickle format.  
>You know, backward compatibility?
>
>--
>https://mail.python.org/mailman/listinfo/python-list

We have an issue about that.
https://bugs.python.org/issue13566

Go there and say you want it too. :)

Laura

Reply | Threaded
Open this post in threaded view
|

enhancement request: make py3 read/write py2 pickle format

Chris Warrick
In reply to this post by Neal Becker
On Tue, Jun 9, 2015 at 8:08 PM, Neal Becker <ndbecker2 at gmail.com> wrote:
> One of the most annoying problems with py2/3 interoperability is that the
> pickle formats are not compatible.  There must be many who, like myself,
> often use pickle format for data storage.
>
> It certainly would be a big help if py3 could read/write py2 pickle format.
> You know, backward compatibility?

Don?t use pickle. It?s unsafe ? it executes arbitrary code, which
means someone can give you a pickle file that will delete all your
files or eat your cat.

Instead, use a safe format that has no ability to execute code, like
JSON. It will also work with other programming languages and
environments if you ever need to talk to anyone else.

But, FYI: there is backwards compatibility if you ask for it, in the
form of protocol versions. That?s all you should know ? again, don?t
use pickle.

--
Chris Warrick <https://chriswarrick.com/>
PGP: 5EAAEA16

Reply | Threaded
Open this post in threaded view
|

enhancement request: make py3 read/write py2 pickle format

Zachary Ware-2
In reply to this post by Neal Becker
On Tue, Jun 9, 2015 at 1:08 PM, Neal Becker <ndbecker2 at gmail.com> wrote:
> One of the most annoying problems with py2/3 interoperability is that the
> pickle formats are not compatible.  There must be many who, like myself,
> often use pickle format for data storage.
>
> It certainly would be a big help if py3 could read/write py2 pickle format.
> You know, backward compatibility?

Uhh...

$ python
Python 2.7.6 (default, Sep  9 2014, 15:04:36)
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.39)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pickle
>>> with open('test.pkl', 'wb') as f:
...  pickle.dump({'test': [1, 2, {3}]}, f)
...
>>> with open('test.pkl', 'rb') as f:
...  pickle.load(f)
...
{'test': [1, 2, set([3])]}
>>> ^D
$ python3
Python 3.4.3 (v3.4.3:9b73f1c3e601, Feb 23 2015, 02:52:03)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pickle
>>> with open('test.pkl', 'rb') as f:
...  pickle.load(f)
...
{'test': [1, 2, {3}]}
>>> with open('test2.pkl', 'wb') as f:
...  pickle.dump(['test', {2: {3.4}}], f, protocol=2)
...
>>> with open('test2.pkl', 'rb') as f:
...  pickle.load(f)
...
['test', {2: {3.4}}]
>>> ^D
? ~
13:35 $ python
Python 2.7.6 (default, Sep  9 2014, 15:04:36)
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.39)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pickle
>>> with open('test2.pkl', 'rb') as f:
...  pickle.load(f)
...
[u'test', {2: set([3.4])}]
>>>

--
Zach

Reply | Threaded
Open this post in threaded view
|

enhancement request: make py3 read/write py2 pickle format

Serhiy Storchaka-2
In reply to this post by Neal Becker
On 09.06.15 21:08, Neal Becker wrote:
> One of the most annoying problems with py2/3 interoperability is that the
> pickle formats are not compatible.  There must be many who, like myself,
> often use pickle format for data storage.
>
> It certainly would be a big help if py3 could read/write py2 pickle format.
> You know, backward compatibility?

Pickle format is mostly compatible. What is your issue?



Reply | Threaded
Open this post in threaded view
|

enhancement request: make py3 read/write py2 pickle format

Serhiy Storchaka-2
In reply to this post by Laura Creighton-2
On 09.06.15 21:31, Laura Creighton wrote:
> We have an issue about that.
> https://bugs.python.org/issue13566
>
> Go there and say you want it too. :)

I afraid issue title is too general. :) The issue is only about one
minor detail of py3 to py2 compatibility.


Reply | Threaded
Open this post in threaded view
|

enhancement request: make py3 read/write py2 pickle format

Devin Jeanpierre
In reply to this post by Chris Warrick
There's a lot of subtle issues with pickle compatibility. e.g.
old-style vs new-style classes. It's kinda hard and it's better to
give up. I definitely agree it's better to use something else instead.
For example, we switched to using protocol buffers, which have much
better compatibility properties and are a bit more testable to boot
(since text format protobufs are always output in a canonical (sorted)
form.)

-- Devin

On Tue, Jun 9, 2015 at 11:35 AM, Chris Warrick <kwpolska at gmail.com> wrote:

> On Tue, Jun 9, 2015 at 8:08 PM, Neal Becker <ndbecker2 at gmail.com> wrote:
>> One of the most annoying problems with py2/3 interoperability is that the
>> pickle formats are not compatible.  There must be many who, like myself,
>> often use pickle format for data storage.
>>
>> It certainly would be a big help if py3 could read/write py2 pickle format.
>> You know, backward compatibility?
>
> Don?t use pickle. It?s unsafe ? it executes arbitrary code, which
> means someone can give you a pickle file that will delete all your
> files or eat your cat.
>
> Instead, use a safe format that has no ability to execute code, like
> JSON. It will also work with other programming languages and
> environments if you ever need to talk to anyone else.
>
> But, FYI: there is backwards compatibility if you ask for it, in the
> form of protocol versions. That?s all you should know ? again, don?t
> use pickle.
>
> --
> Chris Warrick <https://chriswarrick.com/>
> PGP: 5EAAEA16
> --
> https://mail.python.org/mailman/listinfo/python-list

Reply | Threaded
Open this post in threaded view
|

enhancement request: make py3 read/write py2 pickle format

Chris Angelico
On Wed, Jun 10, 2015 at 6:07 AM, Devin Jeanpierre
<jeanpierreda at gmail.com> wrote:
> There's a lot of subtle issues with pickle compatibility. e.g.
> old-style vs new-style classes. It's kinda hard and it's better to
> give up. I definitely agree it's better to use something else instead.
> For example, we switched to using protocol buffers, which have much
> better compatibility properties and are a bit more testable to boot
> (since text format protobufs are always output in a canonical (sorted)
> form.)

Or use JSON, if your data fits within that structure. It's easy to
read and write, it's human-readable, and it's safe (no chance of
arbitrary code execution). Forcing yourself to use a format that can
basically be processed by ast.literal_eval() is a good discipline -
means you don't accidentally save/load too much.

ChrisA

Reply | Threaded
Open this post in threaded view
|

enhancement request: make py3 read/write py2 pickle format

Irmen de Jong
In reply to this post by Devin Jeanpierre
On 10-6-2015 1:06, Chris Angelico wrote:

> On Wed, Jun 10, 2015 at 6:07 AM, Devin Jeanpierre
> <jeanpierreda at gmail.com> wrote:
>> There's a lot of subtle issues with pickle compatibility. e.g.
>> old-style vs new-style classes. It's kinda hard and it's better to
>> give up. I definitely agree it's better to use something else instead.
>> For example, we switched to using protocol buffers, which have much
>> better compatibility properties and are a bit more testable to boot
>> (since text format protobufs are always output in a canonical (sorted)
>> form.)
>
> Or use JSON, if your data fits within that structure. It's easy to
> read and write, it's human-readable, and it's safe (no chance of
> arbitrary code execution). Forcing yourself to use a format that can
> basically be processed by ast.literal_eval() is a good discipline -
> means you don't accidentally save/load too much.
>
> ChrisA
>

I made a specialized serializer for this, which is more expressive than JSON. It outputs
python literal expressions that can be directly parsed by ast.literal_eval(). You can
find it on pypi (https://pypi.python.org/pypi/serpent).  It's the default serializer of
Pyro, and it includes a Java and .NET version as well as an added bonus.


Irmen



Reply | Threaded
Open this post in threaded view
|

enhancement request: make py3 read/write py2 pickle format

Devin Jeanpierre
Passing around data that can be put into ast.literal_eval is
synonymous with passing around data taht can be put into eval. It
sounds like a trap.

Other points against JSON / etc.: the lack of schema makes it easier
to stuff anything in there (not as easily as pickle, mind), and by
returning a plain dict, it becomes easier to require a field than to
allow a field to be missing, which is bad for robustness and bad for
data format migrations. (Protobuf (v3) has schemas and gives every
field a default value.)

For human readable serialized data, text format protocol buffers are
seriously underrated. (Relatedly: underdocumented, too.)

/me lifts head out of kool-aid and gasps for air

-- Devin

On Tue, Jun 9, 2015 at 5:17 PM, Irmen de Jong <irmen.NOSPAM at xs4all.nl> wrote:

> On 10-6-2015 1:06, Chris Angelico wrote:
>> On Wed, Jun 10, 2015 at 6:07 AM, Devin Jeanpierre
>> <jeanpierreda at gmail.com> wrote:
>>> There's a lot of subtle issues with pickle compatibility. e.g.
>>> old-style vs new-style classes. It's kinda hard and it's better to
>>> give up. I definitely agree it's better to use something else instead.
>>> For example, we switched to using protocol buffers, which have much
>>> better compatibility properties and are a bit more testable to boot
>>> (since text format protobufs are always output in a canonical (sorted)
>>> form.)
>>
>> Or use JSON, if your data fits within that structure. It's easy to
>> read and write, it's human-readable, and it's safe (no chance of
>> arbitrary code execution). Forcing yourself to use a format that can
>> basically be processed by ast.literal_eval() is a good discipline -
>> means you don't accidentally save/load too much.
>>
>> ChrisA
>>
>
> I made a specialized serializer for this, which is more expressive than JSON. It outputs
> python literal expressions that can be directly parsed by ast.literal_eval(). You can
> find it on pypi (https://pypi.python.org/pypi/serpent).  It's the default serializer of
> Pyro, and it includes a Java and .NET version as well as an added bonus.
>
>
> Irmen
>
>
> --
> https://mail.python.org/mailman/listinfo/python-list

Reply | Threaded
Open this post in threaded view
|

enhancement request: make py3 read/write py2 pickle format

Chris Angelico
On Wed, Jun 10, 2015 at 10:47 AM, Devin Jeanpierre
<jeanpierreda at gmail.com> wrote:
> Passing around data that can be put into ast.literal_eval is
> synonymous with passing around data taht can be put into eval. It
> sounds like a trap.

Except that it's hugely restricted. There are JSON parsing libraries
all over the place, and one of the points in favour of JSON is that it
can be dumped into JavaScript as-is and it evaluates correctly.
Doesn't mean you have to use eval() to parse it, and usually you
won't.

> Other points against JSON / etc.: the lack of schema makes it easier
> to stuff anything in there (not as easily as pickle, mind), and by
> returning a plain dict, it becomes easier to require a field than to
> allow a field to be missing, which is bad for robustness and bad for
> data format migrations. (Protobuf (v3) has schemas and gives every
> field a default value.)

This is true. But there are plenty of cases where you can manage on a
simple dictionary that maps keywords to values that are either
strings, numbers, or lists/dicts of same - and without any schema to
tell you what keywords are valid. It's simple, extensible, and scales
reasonably well to mid-sized use cases. Sure, the biggest cases (and
some of the smaller ones) benefit nicely from schema definitions, but
there's room to manage without.

ChrisA

Reply | Threaded
Open this post in threaded view
|

enhancement request: make py3 read/write py2 pickle format

Steven D'Aprano-11
In reply to this post by Irmen de Jong
On Wednesday 10 June 2015 10:47, Devin Jeanpierre wrote:

> Passing around data that can be put into ast.literal_eval is
> synonymous with passing around data taht can be put into eval. It
> sounds like a trap.

In what way?

literal_eval will cleanly and safely refuse to evaluate strings like:

    "len(None)"
    "100**100**100"
    "__import__('os').system('rm this')"


and so on, which makes it significantly safer when given untrusted data. I
suppose that one might be able to perform a DOS attack by passing it:

    "1000 ... 0"

where the ... represents, say, a gigabyte of zeroes, but if an attacker has
the ability to feed you gigabytes of data, they don't need literal_eval to
DOS you.

If you can think of an actual attack against literal_eval, please tell us or
report it, so it can be fixed.


> For human readable serialized data, text format protocol buffers are
> seriously underrated. (Relatedly: underdocumented, too.)

Ironically, literal_eval is designed to process text-format protocols using
human-readable Python syntax for common data types like int, str, and dict.



--
Steve


Reply | Threaded
Open this post in threaded view
|

enhancement request: make py3 read/write py2 pickle format

random832@fastmail.us
On Tue, Jun 9, 2015, at 23:52, Steven D'Aprano wrote:
> > For human readable serialized data, text format protocol buffers are
> > seriously underrated. (Relatedly: underdocumented, too.)
>
> Ironically, literal_eval is designed to process text-format protocols
> using
> human-readable Python syntax for common data types like int, str, and
> dict.

"protocol buffers" is the name of a specific tool.

Reply | Threaded
Open this post in threaded view
|

enhancement request: make py3 read/write py2 pickle format

Chris Angelico
On Wed, Jun 10, 2015 at 1:57 PM,  <random832 at fastmail.us> wrote:

> On Tue, Jun 9, 2015, at 23:52, Steven D'Aprano wrote:
>> > For human readable serialized data, text format protocol buffers are
>> > seriously underrated. (Relatedly: underdocumented, too.)
>>
>> Ironically, literal_eval is designed to process text-format protocols
>> using
>> human-readable Python syntax for common data types like int, str, and
>> dict.
>
> "protocol buffers" is the name of a specific tool.

Yes, it is. But the point is that literal_eval, JSON, and other such
tools are _also_ text-format protocols that serialize to/from human
readable data. I'm not sure what the advantage of protocol buffers is,
but it's not like "human readable" is such a rarity. (It is still a
strike against pickle.)

ChrisA

Reply | Threaded
Open this post in threaded view
|

enhancement request: make py3 read/write py2 pickle format

Devin Jeanpierre
In reply to this post by Steven D'Aprano-11
On Tue, Jun 9, 2015 at 8:52 PM, Steven D'Aprano
<steve+comp.lang.python at pearwood.info> wrote:
> On Wednesday 10 June 2015 10:47, Devin Jeanpierre wrote:
>
>> Passing around data that can be put into ast.literal_eval is
>> synonymous with passing around data taht can be put into eval. It
>> sounds like a trap.
>
> In what way?

I misspoke, and instead of "synonymous", meant "also means".
(Implication, not equivalence.)

>> For human readable serialized data, text format protocol buffers are
>> seriously underrated. (Relatedly: underdocumented, too.)
>
> Ironically, literal_eval is designed to process text-format protocols using
> human-readable Python syntax for common data types like int, str, and dict.

"Protocol buffers" are a specific technology, not an abstract concept,
and literal_eval is not a great idea.

* the common serializer (repr) does not output a canonical form, and
  can serialize things in a way that they can't be deserialized
* there is no schema
* there is no well understood migration story for when the data you
  load and store changes
* it is not usable from other programming languages
* it encourages the use of eval when literal_eval becomes inconvenient
  or insufficient
* It is not particularly well specified or documented compared to the
  alternatives.
* The types you get back differ in python 2 vs 3

For most apps, the alternatives are better. Irmen's serpent library is
strictly better on every front, for example. (Except potentially
security, who knows.)

At least it's better than pickle, security wise. Reliability wise,
repr is a black hole, so no dice. :(

-- Devin

Reply | Threaded
Open this post in threaded view
|

enhancement request: make py3 read/write py2 pickle format

Steven D'Aprano-11
In reply to this post by Steven D'Aprano-11
On Wednesday 10 June 2015 13:57, random832 at fastmail.us wrote:

> On Tue, Jun 9, 2015, at 23:52, Steven D'Aprano wrote:
>> > For human readable serialized data, text format protocol buffers are
>> > seriously underrated. (Relatedly: underdocumented, too.)
>>
>> Ironically, literal_eval is designed to process text-format protocols
>> using
>> human-readable Python syntax for common data types like int, str, and
>> dict.
>
> "protocol buffers" is the name of a specific tool.

It is? It sounds like a generic term for, you know, a buffer used by a
protocol. I live and learn.

https://developers.google.com/protocol-buffers/docs/pythontutorial

You have to:

- write a data template, in a separate file; just don't call it a schema,
because this isn't XML;

- don't forget the technically-optional-but-recommended (and required if you
use other languages) "package" header, which is completely redundant in
Python;

- run a separate compiler over that template, which will generate Python
classes for you; just don't think that these classes are first class
citizens that you can extend using inheritance, because they're not;

- import the generated module containing those classes;

- and now you have you're very own private pickle-like format, yay!


I'm sure that this has its uses for big, complex projects, but for
lightweight needs, it seems over-engineered and unPythonic.

--
Steve


Reply | Threaded
Open this post in threaded view
|

enhancement request: make py3 read/write py2 pickle format

Steven D'Aprano-11
In reply to this post by Steven D'Aprano-11
On Wednesday 10 June 2015 14:48, Devin Jeanpierre wrote:

[...]
> and literal_eval is not a great idea.
>
> * the common serializer (repr) does not output a canonical form, and
>   can serialize things in a way that they can't be deserialized

For literals, the canonical form is that understood by Python. I'm pretty
sure that these have been stable since the days of Python 1.0, and will
remain so pretty much forever:

ints: 12345
floats: 1.2345
strings: "spam"
None
True
False
lists, tuples, dicts and sets containing the above

There may be a few differences between Python 2 and 3, e.g. no set literal
in Python 2, but in general the Python syntax is well-known and understood
by anyone programming in Python.


> * there is no schema
> * there is no well understood migration story for when the data you
>   load and store changes

literal_eval is not a serialisation format itself. It is a primitive
operation usable when serialising. E.g. you might write out a simple Unix-
style rc file of key:value pairs:


length=23.45
width=10.95
landscape=False

split on "=" and call literal_eval on the value.

This is a perfectly reasonable light-weight solution for simple
serialisation needs.


> * it is not usable from other programming languages

That's okay, we're not writing in other programming languages :-)


> * it encourages the use of eval when literal_eval becomes inconvenient
>   or insufficient

I don't think so. I think that people who make the effort to import ast and
call ast.literal_eval are fully aware of the dangers of eval and aren't
silly enough to start using eval.


> * It is not particularly well specified or documented compared to the
>   alternatives.
> * The types you get back differ in python 2 vs 3

Doesn't matter. The type you *write* are different in Python 2 vs 3, so of
course you do.


> For most apps, the alternatives are better. Irmen's serpent library is
> strictly better on every front, for example. (Except potentially
> security, who knows.)

Beyond simple needs, like rc files, literal_eval is not sufficient. You
can't use it to deserialise arbitrary objects. That might be a feature, but
if you need something more powerful than basic ints, floats, strings and a
few others, literal_eval will not be powerful enough.

I think we are in violent agreement :-)

--
Steve


Reply | Threaded
Open this post in threaded view
|

enhancement request: make py3 read/write py2 pickle format

Neal Becker
In reply to this post by Chris Warrick
Chris Warrick wrote:

> On Tue, Jun 9, 2015 at 8:08 PM, Neal Becker <ndbecker2 at gmail.com> wrote:
>> One of the most annoying problems with py2/3 interoperability is that the
>> pickle formats are not compatible.  There must be many who, like myself,
>> often use pickle format for data storage.
>>
>> It certainly would be a big help if py3 could read/write py2 pickle
>> format. You know, backward compatibility?
>
> Don?t use pickle. It?s unsafe ? it executes arbitrary code, which
> means someone can give you a pickle file that will delete all your
> files or eat your cat.
>
> Instead, use a safe format that has no ability to execute code, like
> JSON. It will also work with other programming languages and
> environments if you ever need to talk to anyone else.
>
> But, FYI: there is backwards compatibility if you ask for it, in the
> form of protocol versions. That?s all you should know ? again, don?t
> use pickle.
>

I believe a good native serialization system is essential for any modern
programming language.  If pickle isn't it, we need something else that can
serialize all language objects.  Or, are you saying, it's impossible to do
this safely?


Reply | Threaded
Open this post in threaded view
|

enhancement request: make py3 read/write py2 pickle format

Robert Kern-2
On 2015-06-10 12:04, Neal Becker wrote:

> Chris Warrick wrote:
>
>> On Tue, Jun 9, 2015 at 8:08 PM, Neal Becker <ndbecker2 at gmail.com> wrote:
>>> One of the most annoying problems with py2/3 interoperability is that the
>>> pickle formats are not compatible.  There must be many who, like myself,
>>> often use pickle format for data storage.
>>>
>>> It certainly would be a big help if py3 could read/write py2 pickle
>>> format. You know, backward compatibility?
>>
>> Don?t use pickle. It?s unsafe ? it executes arbitrary code, which
>> means someone can give you a pickle file that will delete all your
>> files or eat your cat.
>>
>> Instead, use a safe format that has no ability to execute code, like
>> JSON. It will also work with other programming languages and
>> environments if you ever need to talk to anyone else.
>>
>> But, FYI: there is backwards compatibility if you ask for it, in the
>> form of protocol versions. That?s all you should know ? again, don?t
>> use pickle.
>
> I believe a good native serialization system is essential for any modern
> programming language.  If pickle isn't it, we need something else that can
> serialize all language objects.  Or, are you saying, it's impossible to do
> this safely?

By the very nature of the stated problem: serializing all language objects.
Being able to construct any object, including instances of arbitrary classes,
means that arbitrary code can be executed. All I have to do is make a pickle
file for an object that claims that its constructor is shutil.rmtree().

This is fine in some use cases (e.g. wire format for otherwise-secured
communication between two endpoints under your complete control), but it is
worrying in others, like your use case of data storage (and presumably sharing).

Python 2/3 is also the least of your compatibility worries there. Refactor a
class to a different module, or did one of your third-party dependencies do
this? Poof! Your pickle files no longer work.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
  that is made terrible by our own mad attempt to interpret it as though it had
  an underlying truth."
   -- Umberto Eco


12