Quantcast

[issue14371] Add support for bzip2 compression to the zipfile module

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
21 messages Options
12
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[issue14371] Add support for bzip2 compression to the zipfile module

STINNER Victor

New submission from Serhiy Storchaka <[hidden email]>:

ZIP File Format Specification (http://www.pkware.com/documents/casestudies/APPNOTE.TXT) supports bzip2 compression since at least 2003. Since bzip2 contained in Python standart library, it would be nice to add support for these method in zipfile. This will allow to process more foreign zip files and create more compact distributives.

The proposed patch adds new method ZIP_BZIP2, which is automatically detecting when unpacking and that can be used for packing.

----------
components: Library (Lib)
files: bzip2_in_zip.patch
keywords: patch
messages: 156394
nosy: storchaka
priority: normal
severity: normal
status: open
title: Add support for bzip2 compression to the zipfile module
type: enhancement
versions: Python 3.3
Added file: http://bugs.python.org/file24956/bzip2_in_zip.patch

_______________________________________
Python tracker <[hidden email]>
<http://bugs.python.org/issue14371>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/lists%2B1322467933539-512619%40n6.nabble.com

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[issue14371] Add support for bzip2 compression to the zipfile module

STINNER Victor

Changes by Martin v. Löwis <[hidden email]>:


Added file: http://bugs.python.org/file24964/bzip2_in_zip_review.patch

_______________________________________
Python tracker <[hidden email]>
<http://bugs.python.org/issue14371>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/lists%2B1322467933539-512619%40n6.nabble.com

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[issue14371] Add support for bzip2 compression to the zipfile module

STINNER Victor
In reply to this post by STINNER Victor

Martin v. Löwis <[hidden email]> added the comment:

Can you please submit a contributor form?

http://python.org/psf/contrib/contrib-form/
http://python.org/psf/contrib/

----------
nosy: +loewis

_______________________________________
Python tracker <[hidden email]>
<http://bugs.python.org/issue14371>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/lists%2B1322467933539-512619%40n6.nabble.com

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[issue14371] Add support for bzip2 compression to the zipfile module

STINNER Victor
In reply to this post by STINNER Victor

Martin v. Löwis <[hidden email]> added the comment:

The patch looks good. Can you also provide a test case?

----------

_______________________________________
Python tracker <[hidden email]>
<http://bugs.python.org/issue14371>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/lists%2B1322467933539-512619%40n6.nabble.com

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[issue14371] Add support for bzip2 compression to the zipfile module

STINNER Victor
In reply to this post by STINNER Victor

Serhiy Storchaka <[hidden email]> added the comment:

I am working on this. Should I add tests to test_zipfile.py or create new test_zipfile_bzip2.py?

It would add a note that the bzip2 compression can understand not all programs (and do not understand the older versions of Python), but understands the Info-Unzip? My English is not enough for the documentation.

----------

_______________________________________
Python tracker <[hidden email]>
<http://bugs.python.org/issue14371>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/lists%2B1322467933539-512619%40n6.nabble.com

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[issue14371] Add support for bzip2 compression to the zipfile module

STINNER Victor
In reply to this post by STINNER Victor

Martin v. Löwis <[hidden email]> added the comment:

Please add it to test_zipfile.

As for the documentation, I propose the wording

"bzip2 compression was added to the zip file format in 2001. However, even more recent tools (including older Python releases) may not support it, causing either refusal to process the zip file altogether, or faiilure to extract individual files."

I'm not a native speaker of English, either. Feel free to put things through Google translate; some native speaker will pick up the text and correct it.

----------

_______________________________________
Python tracker <[hidden email]>
<http://bugs.python.org/issue14371>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/lists%2B1322467933539-512619%40n6.nabble.com

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[issue14371] Add support for bzip2 compression to the zipfile module

STINNER Victor
In reply to this post by STINNER Victor

Serhiy Storchaka <[hidden email]> added the comment:

Thanks to the tests, I found the error. Since the bzip2 is block algorithm, decompressor need to eat a certain amount of data, so it began to return data. Now when reading small chunks turns out premature end of data. I'm working on a fix.

----------

_______________________________________
Python tracker <[hidden email]>
<http://bugs.python.org/issue14371>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/lists%2B1322467933539-512619%40n6.nabble.com

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[issue14371] Add support for bzip2 compression to the zipfile module

STINNER Victor
In reply to this post by STINNER Victor

Changes by Serhiy Storchaka <[hidden email]>:


Added file: http://bugs.python.org/file24982/bzip2_in_zip_tests.patch

_______________________________________
Python tracker <[hidden email]>
<http://bugs.python.org/issue14371>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/lists%2B1322467933539-512619%40n6.nabble.com

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[issue14371] Add support for bzip2 compression to the zipfile module

STINNER Victor
In reply to this post by STINNER Victor

Serhiy Storchaka <[hidden email]> added the comment:

All errors are fixed. All tests are passed. Unfortunately, the patch was more than expected. This is necessary for correct and effective work with large bzip2 buffers (for other codecs can also be a profit).

----------
Added file: http://bugs.python.org/file24996/bzip2_in_zip_2.patch

_______________________________________
Python tracker <[hidden email]>
<http://bugs.python.org/issue14371>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/lists%2B1322467933539-512619%40n6.nabble.com

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[issue14371] Add support for bzip2 compression to the zipfile module

STINNER Victor
In reply to this post by STINNER Victor

Changes by Antoine Pitrou <[hidden email]>:


----------
nosy: +nadeem.vawda
stage:  -> patch review

_______________________________________
Python tracker <[hidden email]>
<http://bugs.python.org/issue14371>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/lists%2B1322467933539-512619%40n6.nabble.com

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[issue14371] Add support for bzip2 compression to the zipfile module

STINNER Victor
In reply to this post by STINNER Victor

Nadeem Vawda <[hidden email]> added the comment:

[Adding Alan McIntyre, who is listed as zipfile's maintainer.]


I haven't yet had a chance to properly familiarize myself with the
zipfile module, but I did notice an issue in the changes to ZipExtFile's
read() method. The existing code uses the b"".join() idiom for linear-
time concatenation, but the patch replaces it with a version that does
"buf += data" after each read. CPython can (I think) do this efficiently,
but it can be much slower on other implementations.


Martin:
> As for the documentation, I propose the wording
>
> "bzip2 compression was added to the zip file format in 2001. However, even more recent tools (including older Python releases) may not support it, causing either refusal to process the zip file altogether, or faiilure to extract individual files."

How about this?

"The zip format specification has included support for bzip2 compression
since 2001. However, some tools (including older Python releases) do not
support it, and may either refuse to process the zip file altogether, or
fail to extract individual files."

----------
nosy: +alanmcintyre

_______________________________________
Python tracker <[hidden email]>
<http://bugs.python.org/issue14371>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/lists%2B1322467933539-512619%40n6.nabble.com

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[issue14371] Add support for bzip2 compression to the zipfile module

STINNER Victor

Serhiy Storchaka <[hidden email]> added the comment:

> The existing code uses the b"".join() idiom for linear-
> time concatenation, but the patch replaces it with a version that does
> "buf += data" after each read.

You made a mess. The existing code uses ``buf += data``, but I allowed myself
to replace it with the ``b"".join()`` idiom. The bzip2 codec has to deal with
large pieces of data, now this may be important. In read1 still used ``buf +=
data``, but not in loop, there is a concatenation of the only two pieces.

> "The zip format specification has included support for bzip2 compression

Thank you. Can you offer the variant with including both bzip2 and lzma
(supported since 2006)? I put him in the upcoming patch that adds support for
lzma compression  to the zipfile module.

----------

_______________________________________
Python tracker <[hidden email]>
<http://bugs.python.org/issue14371>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/lists%2B1322467933539-512619%40n6.nabble.com

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[issue14371] Add support for bzip2 compression to the zipfile module

STINNER Victor
In reply to this post by STINNER Victor

Nadeem Vawda <[hidden email]> added the comment:

> You made a mess. The existing code uses ``buf += data``, but I allowed myself
> to replace it with the ``b"".join()`` idiom. The bzip2 codec has to deal with
> large pieces of data, now this may be important. In read1 still used ``buf +=
> data``, but not in loop, there is a concatenation of the only two pieces.

My mistake; I confused the bodies of read() and read1().


> Thank you. Can you offer the variant with including both bzip2 and lzma
> (supported since 2006)? I put him in the upcoming patch that adds support for
> lzma compression  to the zipfile module.

"The zip format specification has included support for bzip2 compression
since 2001, and for LZMA compression since 2006. However, some tools
(including older Python releases) do not support these compression
methods, and may either refuse to process the zip file altogether,
or fail to extract individual files."

----------

_______________________________________
Python tracker <[hidden email]>
<http://bugs.python.org/issue14371>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/lists%2B1322467933539-512619%40n6.nabble.com

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[issue14371] Add support for bzip2 compression to the zipfile module

STINNER Victor
In reply to this post by STINNER Victor

Serhiy Storchaka <[hidden email]> added the comment:

Fixed regeression in decompression.

Nadeem Vawda, we both were wrong. `buf += data` is noticeably faster `b''.join()` in CPython.

----------
Added file: http://bugs.python.org/file25006/bzip2_in_zip_3.patch

_______________________________________
Python tracker <[hidden email]>
<http://bugs.python.org/issue14371>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/lists%2B1322467933539-512619%40n6.nabble.com

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[issue14371] Add support for bzip2 compression to the zipfile module

STINNER Victor
In reply to this post by STINNER Victor

Changes by Jesús Cea Avión <[hidden email]>:


----------
nosy: +jcea

_______________________________________
Python tracker <[hidden email]>
<http://bugs.python.org/issue14371>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/lists%2B1322467933539-512619%40n6.nabble.com

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[issue14371] Add support for bzip2 compression to the zipfile module

STINNER Victor
In reply to this post by STINNER Victor

Martin v. Löwis <[hidden email]> added the comment:

What's the status of your contrib form?

----------

_______________________________________
Python tracker <[hidden email]>
<http://bugs.python.org/issue14371>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/lists%2B1322467933539-512619%40n6.nabble.com

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[issue14371] Add support for bzip2 compression to the zipfile module

STINNER Victor
In reply to this post by STINNER Victor

Antoine Pitrou <[hidden email]> added the comment:

> `buf += data` is noticeably faster `b''.join()` in CPython.

Perhaps because your system's memory allocator is extremely good (or buf is always very small), but b''.join() is far more robust.
Another alternative is accumulating in a bytearray, since it uses overallocation for linear time appending.

----------
nosy: +pitrou

_______________________________________
Python tracker <[hidden email]>
<http://bugs.python.org/issue14371>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/lists%2B1322467933539-512619%40n6.nabble.com

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[issue14371] Add support for bzip2 compression to the zipfile module

STINNER Victor
In reply to this post by STINNER Victor

Serhiy Storchaka <[hidden email]> added the comment:

> What's the status of your contrib form?

Oops. I put this off for a detailed study and forgotten.

I will send the form, as only get to the printer and the scanner.

----------

_______________________________________
Python tracker <[hidden email]>
<http://bugs.python.org/issue14371>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/lists%2B1322467933539-512619%40n6.nabble.com

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[issue14371] Add support for bzip2 compression to the zipfile module

STINNER Victor
In reply to this post by STINNER Victor

Serhiy Storchaka <[hidden email]> added the comment:

> Perhaps because your system's memory allocator is extremely good (or buf is always very small), but b''.join() is far more robust.
> Another alternative is accumulating in a bytearray, since it uses overallocation for linear time appending.

I thought, that it was in special optimization, mentioned in the
python-dev, but could not find this in the code. Perhaps it had not been
implemented.

In this particular case, the bytes appending is performed only once (and
probably a lot of appending with b''). Exceptions are possible only in
pathological cases, for example when compressed data is much larger
uncompressed data. The current implementation uses `buf += data`, if
someone wants to change it, then it's not me.

----------

_______________________________________
Python tracker <[hidden email]>
<http://bugs.python.org/issue14371>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/lists%2B1322467933539-512619%40n6.nabble.com

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[issue14371] Add support for bzip2 compression to the zipfile module

STINNER Victor
In reply to this post by STINNER Victor

Roundup Robot <[hidden email]> added the comment:

New changeset 028e8e0b03e8 by Martin v. Löwis in branch 'default':
Issue #14371: Support bzip2 in zipfile module.
http://hg.python.org/cpython/rev/028e8e0b03e8

----------
nosy: +python-dev

_______________________________________
Python tracker <[hidden email]>
<http://bugs.python.org/issue14371>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/lists%2B1322467933539-512619%40n6.nabble.com

12
Loading...