Quantcast

Proposal: close the PyPI file-replacement loophole

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
82 messages Options
12345
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Proposal: close the PyPI file-replacement loophole

Richard Jones-7
Hi catalog-sig,

When we initially implemented file upload to PyPI it was our intention
that the file be immutable once uploaded. The goal was to make things
significantly simpler for end users - there would only ever be one
file with a given name. If the content changed then so must the name
(typically by creating a new release version.)

After the upload facility was put in place we also added the ability
to delete files uploaded to pypi. This created a loophole: if a
package owner knew how to they could delete the file and re-upload,
thus circumventing the replacement protection.

I'm considering closing this loophole by retaining a record of the
uploaded file (though not the contents) so that future uploads with
the same name wouldn't be allowed. I understand that this is how the
ruby gem archive handles deletion of files.

Your thoughts?


     Richard
_______________________________________________
Catalog-SIG mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/catalog-sig
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Proposal: close the PyPI file-replacement loophole

Robert Collins
On Mon, Jan 30, 2012 at 12:47 PM, Richard Jones <[hidden email]> wrote:
> I'm considering closing this loophole by retaining a record of the
> uploaded file (though not the contents) so that future uploads with
> the same name wouldn't be allowed. I understand that this is how the
> ruby gem archive handles deletion of files.

Please allow for never-downloaded files to be replaced; or perhaps
some low threshold (like 2 or 3) downloads. Its handy when a bad
upload is made to just-fix-it.

-Rob
_______________________________________________
Catalog-SIG mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/catalog-sig
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Proposal: close the PyPI file-replacement loophole

"Martin v. Löwis"
In reply to this post by Richard Jones-7
> When we initially implemented file upload to PyPI it was our intention
> that the file be immutable once uploaded. The goal was to make things
> significantly simpler for end users - there would only ever be one
> file with a given name. If the content changed then so must the name
> (typically by creating a new release version.)

I don't actually recall that being a goal :-)

> Your thoughts?

-1. There are plenty of ways to check whether the file was modified if
you already have a copy of it. Users just need to accept that files may
change, and package authors need to accept that users may retain old
copies of a file even after they replaced it.

I just got a user comment a week ago of a user explicitly thanking about
the ability to replace files after already publishing them.

Regards,
Martin
_______________________________________________
Catalog-SIG mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/catalog-sig
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Proposal: close the PyPI file-replacement loophole

Donald Stufft
I'm very much +1 on this. I don't see any use case for modifying a file after it was already released. It means that if I install version 1.0 of a package today, tomorrow version 1.0 of a package might be different and introduce incompatibilities. It lowers the ability of anyone using PyPI or it's mirrors to be secure in the fact that when they tested their application with versions X,Y,Z of a library, that it should continue to work exactly the same with versions X,Y,Z as a library.

There isn't a limited set of version numbers, if someone makes a mistake on packaging the can delete the file, and increase the version number.

On Sunday, January 29, 2012 at 7:38 PM, "Martin v. Löwis" wrote:

When we initially implemented file upload to PyPI it was our intention
that the file be immutable once uploaded. The goal was to make things
significantly simpler for end users - there would only ever be one
file with a given name. If the content changed then so must the name
(typically by creating a new release version.)

I don't actually recall that being a goal :-)

Your thoughts?

-1. There are plenty of ways to check whether the file was modified if
you already have a copy of it. Users just need to accept that files may
change, and package authors need to accept that users may retain old
copies of a file even after they replaced it.

I just got a user comment a week ago of a user explicitly thanking about
the ability to replace files after already publishing them.

Regards,
Martin
_______________________________________________
Catalog-SIG mailing list


_______________________________________________
Catalog-SIG mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/catalog-sig
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Proposal: close the PyPI file-replacement loophole

Donald Stufft
In reply to this post by "Martin v. Löwis"

On Sunday, January 29, 2012 at 7:38 PM, "Martin v. Löwis" wrote:

When we initially implemented file upload to PyPI it was our intention
that the file be immutable once uploaded. The goal was to make things
significantly simpler for end users - there would only ever be one
file with a given name. If the content changed then so must the name
(typically by creating a new release version.)

I don't actually recall that being a goal :-)

Your thoughts?

-1. There are plenty of ways to check whether the file was modified if
you already have a copy of it. Users just need to accept that files may
change, and package authors need to accept that users may retain old
copies of a file even after they replaced it.
I don't always have a copy of the file, I might only have a reference  such as slumber==0.3.0. 

I just got a user comment a week ago of a user explicitly thanking about
the ability to replace files after already publishing them.

Regards,
Martin
_______________________________________________
Catalog-SIG mailing list


_______________________________________________
Catalog-SIG mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/catalog-sig
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Proposal: close the PyPI file-replacement loophole

Richard Jones-7
In reply to this post by Robert Collins
On 30 January 2012 10:59, Robert Collins <[hidden email]> wrote:
> On Mon, Jan 30, 2012 at 12:47 PM, Richard Jones <[hidden email]> wrote:
>> I'm considering closing this loophole by retaining a record of the
>> uploaded file (though not the contents) so that future uploads with
>> the same name wouldn't be allowed. I understand that this is how the
>> ruby gem archive handles deletion of files.
>
> Please allow for never-downloaded files to be replaced; or perhaps
> some low threshold (like 2 or 3) downloads. Its handy when a bad
> upload is made to just-fix-it.

This is tricky: download counts are only tallied once every 24 hours
using the local web server logs and grabbing the download count files
from the mirrors.


     Richard
_______________________________________________
Catalog-SIG mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/catalog-sig
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Proposal: close the PyPI file-replacement loophole

Thomas Lotze-2
In reply to this post by Richard Jones-7
Richard Jones wrote:

> I'm considering closing this loophole by retaining a record of the
> uploaded file (though not the contents) so that future uploads with the
> same name wouldn't be allowed. I understand that this is how the ruby gem
> archive handles deletion of files.

I'd even suggest disallowing to delete files in the first place and
retain them including their contents. I regularly see trouble arising from
files having been deleted from PyPI that are needed even after their
authors considered them obsolete. This may simply be due to version
pinning in some application deployment or similar.

--
Thomas



_______________________________________________
Catalog-SIG mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/catalog-sig
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Proposal: close the PyPI file-replacement loophole

Richard Jones-7

This has been discussed previously (see the mailing list archive.) As a matter of policy we will always allow users to delete their content from pypi.

On Jan 30, 2012 5:26 PM, "Thomas Lotze" <[hidden email]> wrote:
Richard Jones wrote:

> I'm considering closing this loophole by retaining a record of the
> uploaded file (though not the contents) so that future uploads with the
> same name wouldn't be allowed. I understand that this is how the ruby gem
> archive handles deletion of files.

I'd even suggest disallowing to delete files in the first place and
retain them including their contents. I regularly see trouble arising from
files having been deleted from PyPI that are needed even after their
authors considered them obsolete. This may simply be due to version
pinning in some application deployment or similar.

--
Thomas



_______________________________________________
Catalog-SIG mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/catalog-sig

_______________________________________________
Catalog-SIG mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/catalog-sig
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Proposal: close the PyPI file-replacement loophole

"Martin v. Löwis"
In reply to this post by Donald Stufft
>> -1. There are plenty of ways to check whether the file was modified if
>> you already have a copy of it. Users just need to accept that files may
>> change, and package authors need to accept that users may retain old
>> copies of a file even after they replaced it.
> I don't always have a copy of the file, I might only have a reference
>  such as slumber==0.3.0.

The better. A responsible author, when replacing an existing file,
should make sure that it is reasonably compatible with the previous
copy of the file. E.g. the update may include corrected typos or include
files that the previous copy didn't include; the previous copy may have
actually not worked at all in some circumstances.

Now, it may be that the author does break your code by mistake when
replacing a file. You should then report that to the author, asking
him to restore the original file and be more careful in the future.

Regards,
Martin
_______________________________________________
Catalog-SIG mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/catalog-sig
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Fw: Proposal: close the PyPI file-replacement loophole

Donald Stufft
Forwarding as I mistakingly sent this directly to Matin, sorry!

Forwarded message:

From: Donald Stufft <[hidden email]>
To: "Martin v. Löwis" <[hidden email]>
Date: Monday, January 30, 2012 3:36:09 AM
Subject: Re: [Catalog-sig] Proposal: close the PyPI file-replacement loophole

A Major goal in any deployment/installation system is reproducible builds. Allowing re uploads directly goes against this goal. I (and a lot of Python developers I would imagine) purposely pin to specific versions because we *know* those versions work. Currently if I rely on PyPI or any of it's mirrors I just have to cross my fingers and hope that the file i'm getting is the file that I tested against. 

A responsible author wouldn't change his files after people are using it. Unfortunately not all authors are responsible which is why restrictions should be put in place. 

I think there are very clear bad things that could happen due to mutable packages, I can't think of a single bad thing that could happen due to immutable packages other than "if the author messed something up he might have to increase his version number". Increasing a version number is a very minor problem compared to breaking software.

So my questions to you are:

1. What is the worst case if packages are made immutable?
2. What is the worst case if they are kept mutable? 
3. Best case for immutable?
4. Best case for mutable?

That I can think of it's: 1) Author Might have to "waste" a version number uploading a fix 2) Author might break (or introduce major security vulnerabilities), inadvertently or otherwise exiting software 3) People depending on packages can use PyPI and be secure in the fact that what they got today will be the same as what they get tomorrow  4) People depending on packages can get "secret" bug fixes.

Between the two the worst case for immutable is basically a noop, and the worst case for mutable is a very serious problem which leads many people to needlessly abandon PyPI for when installing packages matter and use their own internal systems. I very strongly feel that the worst case for mutable is a serious problem and it outweighs the very minor benefit package authors get from being able to re upload.

On an additional note, a good compromise might be to allow reuploads for the first 30 minutes or an hour, and after that prevent it. You still provide that minor benefit in the only situation it's a valid use in my opinion (the "oh no I just uploaded a package and it was broken"), but you let people be secure in the fact that when I test my software against a specific version, I can install that version over and over again and get the same results.

On Monday, January 30, 2012 at 3:04 AM, "Martin v. Löwis" wrote:
-1. There are plenty of ways to check whether the file was modified if
you already have a copy of it. Users just need to accept that files may
change, and package authors need to accept that users may retain old
copies of a file even after they replaced it.
I don't always have a copy of the file, I might only have a reference
such as slumber==0.3.0.

The better. A responsible author, when replacing an existing file,
should make sure that it is reasonably compatible with the previous
copy of the file. E.g. the update may include corrected typos or include
files that the previous copy didn't include; the previous copy may have
actually not worked at all in some circumstances.

Now, it may be that the author does break your code by mistake when
replacing a file. You should then report that to the author, asking
him to restore the original file and be more careful in the future.

Regards,
Martin



_______________________________________________
Catalog-SIG mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/catalog-sig
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Fw: Proposal: close the PyPI file-replacement loophole

David Moss-6
+1 on immutability once a distribution is uploaded to PyPI. The benefits far outweigh the drawbacks. Catering for the "overwrite within a certain time window" scenario just serves to complicate what should be a very simple rule. 

Even though PyPI acts as a passthrough gateway to other file sources e.g. github this shouldn't deter us from aspiring to provide users with greater confidence in the files that are hosted directly on PyPI by making this change.

On 30 Jan 2012, at 08:43, Donald Stufft <[hidden email]> wrote:

Forwarding as I mistakingly sent this directly to Matin, sorry!

Forwarded message:

From: Donald Stufft <[hidden email]>
To: "Martin v. Löwis" <[hidden email]>
Date: Monday, January 30, 2012 3:36:09 AM
Subject: Re: [Catalog-sig] Proposal: close the PyPI file-replacement loophole

A Major goal in any deployment/installation system is reproducible builds. Allowing re uploads directly goes against this goal. I (and a lot of Python developers I would imagine) purposely pin to specific versions because we *know* those versions work. Currently if I rely on PyPI or any of it's mirrors I just have to cross my fingers and hope that the file i'm getting is the file that I tested against. 

A responsible author wouldn't change his files after people are using it. Unfortunately not all authors are responsible which is why restrictions should be put in place. 

I think there are very clear bad things that could happen due to mutable packages, I can't think of a single bad thing that could happen due to immutable packages other than "if the author messed something up he might have to increase his version number". Increasing a version number is a very minor problem compared to breaking software.

So my questions to you are:

1. What is the worst case if packages are made immutable?
2. What is the worst case if they are kept mutable? 
3. Best case for immutable?
4. Best case for mutable?

That I can think of it's: 1) Author Might have to "waste" a version number uploading a fix 2) Author might break (or introduce major security vulnerabilities), inadvertently or otherwise exiting software 3) People depending on packages can use PyPI and be secure in the fact that what they got today will be the same as what they get tomorrow  4) People depending on packages can get "secret" bug fixes.

Between the two the worst case for immutable is basically a noop, and the worst case for mutable is a very serious problem which leads many people to needlessly abandon PyPI for when installing packages matter and use their own internal systems. I very strongly feel that the worst case for mutable is a serious problem and it outweighs the very minor benefit package authors get from being able to re upload.

On an additional note, a good compromise might be to allow reuploads for the first 30 minutes or an hour, and after that prevent it. You still provide that minor benefit in the only situation it's a valid use in my opinion (the "oh no I just uploaded a package and it was broken"), but you let people be secure in the fact that when I test my software against a specific version, I can install that version over and over again and get the same results.

On Monday, January 30, 2012 at 3:04 AM, "Martin v. Löwis" wrote:
-1. There are plenty of ways to check whether the file was modified if
you already have a copy of it. Users just need to accept that files may
change, and package authors need to accept that users may retain old
copies of a file even after they replaced it.
I don't always have a copy of the file, I might only have a reference
such as slumber==0.3.0.

The better. A responsible author, when replacing an existing file,
should make sure that it is reasonably compatible with the previous
copy of the file. E.g. the update may include corrected typos or include
files that the previous copy didn't include; the previous copy may have
actually not worked at all in some circumstances.

Now, it may be that the author does break your code by mistake when
replacing a file. You should then report that to the author, asking
him to restore the original file and be more careful in the future.

Regards,
Martin


_______________________________________________
Catalog-SIG mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/catalog-sig

_______________________________________________
Catalog-SIG mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/catalog-sig
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Proposal: close the PyPI file-replacement loophole

M.-A. Lemburg
In reply to this post by Richard Jones-7
Richard Jones wrote:

> Hi catalog-sig,
>
> When we initially implemented file upload to PyPI it was our intention
> that the file be immutable once uploaded. The goal was to make things
> significantly simpler for end users - there would only ever be one
> file with a given name. If the content changed then so must the name
> (typically by creating a new release version.)
>
> After the upload facility was put in place we also added the ability
> to delete files uploaded to pypi. This created a loophole: if a
> package owner knew how to they could delete the file and re-upload,
> thus circumventing the replacement protection.
>
> I'm considering closing this loophole by retaining a record of the
> uploaded file (though not the contents) so that future uploads with
> the same name wouldn't be allowed. I understand that this is how the
> ruby gem archive handles deletion of files.
>
> Your thoughts?

I don't think that's a good idea, since it would require the
package author to issue a new release whenever something goes wrong
with an upload (e.g. missing files, corrupted archive, etc.).

Please leave the existing logic in place.

Thanks,
--
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Jan 30 2012)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/
_______________________________________________
Catalog-SIG mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/catalog-sig
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Proposal: close the PyPI file-replacement loophole

Donald Stufft

On Monday, January 30, 2012 at 4:23 AM, M.-A. Lemburg wrote:

Richard Jones wrote:
Hi catalog-sig,

When we initially implemented file upload to PyPI it was our intention
that the file be immutable once uploaded. The goal was to make things
significantly simpler for end users - there would only ever be one
file with a given name. If the content changed then so must the name
(typically by creating a new release version.)

After the upload facility was put in place we also added the ability
to delete files uploaded to pypi. This created a loophole: if a
package owner knew how to they could delete the file and re-upload,
thus circumventing the replacement protection.

I'm considering closing this loophole by retaining a record of the
uploaded file (though not the contents) so that future uploads with
the same name wouldn't be allowed. I understand that this is how the
ruby gem archive handles deletion of files.

Your thoughts?

I don't think that's a good idea, since it would require the
package author to issue a new release whenever something goes wrong
with an upload (e.g. missing files, corrupted archive, etc.).

Please leave the existing logic in place.
And version numbers are a scarce resource? (Even though I believe it would be acceptable to cover that particular use case by giving a grace period of when you can re upload). 

Thanks,
--
Marc-Andre Lemburg

Professional Python Services directly from the Source (#1, Jan 30 2012)
Python/Zope Consulting and Support ... http://www.egenix.com/
mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
________________________________________________________________________

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::


eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
Registered at Amtsgericht Duesseldorf: HRB 46611
_______________________________________________
Catalog-SIG mailing list


_______________________________________________
Catalog-SIG mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/catalog-sig
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Proposal: close the PyPI file-replacement loophole

M.-A. Lemburg
Donald Stufft wrote:

>
>
> On Monday, January 30, 2012 at 4:23 AM, M.-A. Lemburg wrote:
>
>> Richard Jones wrote:
>>> Hi catalog-sig,
>>>
>>> When we initially implemented file upload to PyPI it was our intention
>>> that the file be immutable once uploaded. The goal was to make things
>>> significantly simpler for end users - there would only ever be one
>>> file with a given name. If the content changed then so must the name
>>> (typically by creating a new release version.)
>>>
>>> After the upload facility was put in place we also added the ability
>>> to delete files uploaded to pypi. This created a loophole: if a
>>> package owner knew how to they could delete the file and re-upload,
>>> thus circumventing the replacement protection.
>>>
>>> I'm considering closing this loophole by retaining a record of the
>>> uploaded file (though not the contents) so that future uploads with
>>> the same name wouldn't be allowed. I understand that this is how the
>>> ruby gem archive handles deletion of files.
>>>
>>> Your thoughts?
>>
>> I don't think that's a good idea, since it would require the
>> package author to issue a new release whenever something goes wrong
>> with an upload (e.g. missing files, corrupted archive, etc.).
>>
>> Please leave the existing logic in place.
> And version numbers are a scarce resource?

No, but having to kick off the whole release process again
just because something went wrong when uploading release files
to PyPI causes plenty of trouble.

> (Even though I believe it would be acceptable to cover that particular use case by giving a grace period of when you can re upload).

Can't we just leave dealing with that problem to the package authors ?
It's their responsibility, not PyPI's.

--
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Jan 30 2012)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/
_______________________________________________
Catalog-SIG mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/catalog-sig
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Proposal: close the PyPI file-replacement loophole

Donald Stufft
On Monday, January 30, 2012 at 4:46 AM, M.-A. Lemburg wrote:
Donald Stufft wrote:


On Monday, January 30, 2012 at 4:23 AM, M.-A. Lemburg wrote:

Richard Jones wrote:
Hi catalog-sig,

When we initially implemented file upload to PyPI it was our intention
that the file be immutable once uploaded. The goal was to make things
significantly simpler for end users - there would only ever be one
file with a given name. If the content changed then so must the name
(typically by creating a new release version.)

After the upload facility was put in place we also added the ability
to delete files uploaded to pypi. This created a loophole: if a
package owner knew how to they could delete the file and re-upload,
thus circumventing the replacement protection.

I'm considering closing this loophole by retaining a record of the
uploaded file (though not the contents) so that future uploads with
the same name wouldn't be allowed. I understand that this is how the
ruby gem archive handles deletion of files.

Your thoughts?

I don't think that's a good idea, since it would require the
package author to issue a new release whenever something goes wrong
with an upload (e.g. missing files, corrupted archive, etc.).

Please leave the existing logic in place.
And version numbers are a scarce resource?

No, but having to kick off the whole release process again
just because something went wrong when uploading release files
to PyPI causes plenty of trouble.

I would assert that almost every time something goes wrong with "uploading to PyPI" it's actually "I didn't package my software correctly". A better solution to people failing package correctly (missing MANIFEST, whatever) is to test your package prior to uploading to PyPI. Then you don't need mutable files, your release process becomes more robust, your releases become more robust, and the ecosystem in general becomes more robust.

Further more I would still argue that the benefits to the community outweigh the ability for people to skimp on the release process. Either you are doing your releases adhoc, in that case you don't have much of a release process to begin with, so doing it over again to bump it up one more isn't a huge deal, or you have a large release process and testing the package before distributing it should be a part of it. 
(Even though I believe it would be acceptable to cover that particular use case by giving a grace period of when you can re upload).

Can't we just leave dealing with that problem to the package authors ?
It's their responsibility, not PyPI's.
In my opinion No. PyPI is acting as the central repository, it is it's responsibility to take a reasonable effort to protect the people that depend on it. The current solution doesn't just make end developers at risk from a bad author breaking their well tested software. It also puts the security of their software under the the author's watch. Author of a particular package's credentials get leaked/stolen/whatever? suddenly my software is now possibly vulnerable to whatever person did it decides to upload. 

It puts the integrity of my (proverbial my) software in the hands of a disparate group of authors who may or may not have the same stringent testing that I do. Any python application that get's installed from PyPI is at risk of mysteriously breaking, even with a "known good" configuration. These bugs are often hard to track down, and very confusing and difficult to determine why they are occurring when they never did before.

--
Marc-Andre Lemburg

Professional Python Services directly from the Source (#1, Jan 30 2012)
Python/Zope Consulting and Support ... http://www.egenix.com/
mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
________________________________________________________________________

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::


eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
Registered at Amtsgericht Duesseldorf: HRB 46611


_______________________________________________
Catalog-SIG mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/catalog-sig
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Proposal: close the PyPI file-replacement loophole

M.-A. Lemburg
Donald Stufft wrote:

> On Monday, January 30, 2012 at 4:46 AM, M.-A. Lemburg wrote:
>> Donald Stufft wrote:
>>>
>>>
>>> On Monday, January 30, 2012 at 4:23 AM, M.-A. Lemburg wrote:
>>>
>>>> Richard Jones wrote:
>>>>> Hi catalog-sig,
>>>>>
>>>>> When we initially implemented file upload to PyPI it was our intention
>>>>> that the file be immutable once uploaded. The goal was to make things
>>>>> significantly simpler for end users - there would only ever be one
>>>>> file with a given name. If the content changed then so must the name
>>>>> (typically by creating a new release version.)
>>>>>
>>>>> After the upload facility was put in place we also added the ability
>>>>> to delete files uploaded to pypi. This created a loophole: if a
>>>>> package owner knew how to they could delete the file and re-upload,
>>>>> thus circumventing the replacement protection.
>>>>>
>>>>> I'm considering closing this loophole by retaining a record of the
>>>>> uploaded file (though not the contents) so that future uploads with
>>>>> the same name wouldn't be allowed. I understand that this is how the
>>>>> ruby gem archive handles deletion of files.
>>>>>
>>>>> Your thoughts?
>>>>
>>>> I don't think that's a good idea, since it would require the
>>>> package author to issue a new release whenever something goes wrong
>>>> with an upload (e.g. missing files, corrupted archive, etc.).
>>>>
>>>> Please leave the existing logic in place.
>>> And version numbers are a scarce resource?
>>>
>>
>>
>> No, but having to kick off the whole release process again
>> just because something went wrong when uploading release files
>> to PyPI causes plenty of trouble.
>>
> I would assert that almost every time something goes wrong with "uploading to PyPI" it's actually "I didn't package my software correctly". A better solution to people failing package correctly (missing MANIFEST, whatever) is to test your package prior to uploading to PyPI. Then you don't need mutable files, your release process becomes more robust, your releases become more robust, and the ecosystem in general becomes more robust.
>
> Further more I would still argue that the benefits to the community outweigh the ability for people to skimp on the release process. Either you are doing your releases adhoc, in that case you don't have much of a release process to begin with, so doing it over again to bump it up one more isn't a huge deal, or you have a large release process and testing the package before distributing it should be a part of it.

Due to the way PyPI uploads through distutils work by default, it is not
always easy to apply those checks to the uploaded files (distutils
recreates the distribution files when running the upload command and
even though it is possible to have the command reuse an already
created distribution file, that process is tricky and not well known).

Besides, we're not talking about a common case here, just an emergency
exit that can be used if needed.

>>> (Even though I believe it would be acceptable to cover that particular use case by giving a grace period of when you can re upload).
>>
>>
>> Can't we just leave dealing with that problem to the package authors ?
>> It's their responsibility, not PyPI's.
>>
>>
>
> In my opinion No. PyPI is acting as the central repository, it is it's responsibility to take a reasonable effort to protect the people that depend on it. The current solution doesn't just make end developers at risk from a bad author breaking their well tested software. It also puts the security of their software under the the author's watch. Author of a particular package's credentials get leaked/stolen/whatever? suddenly my software is now possibly vulnerable to whatever person did it decides to upload.
>
> It puts the integrity of my (proverbial my) software in the hands of a disparate group of authors who may or may not have the same stringent testing that I do. Any python application that get's installed from PyPI is at risk of mysteriously breaking, even with a "known good" configuration. These bugs are often hard to track down, and very confusing and difficult to determine why they are occurring when they never did before.

PyPI uploads get stored with a hash sum, so any such changes can
easily be recognized on the client side, if there's a need.

--
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Jan 30 2012)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/
_______________________________________________
Catalog-SIG mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/catalog-sig
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Proposal: close the PyPI file-replacement loophole

Yuval Greenfield
On Mon, Jan 30, 2012 at 12:27 PM, M.-A. Lemburg <[hidden email]> wrote:
Besides, we're not talking about a common case here, just an emergency
exit that can be used if needed.


This rare "emergency" can be handled by emailing a pypi admin. It most certainly isn't worth the very real and global security and reliability risks.

Most cases won't email a pypi admin as it's just that easy to increment the version by an 0.0.1 and the fact that it probably isn't an emergency to begin with.

Yuval

_______________________________________________
Catalog-SIG mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/catalog-sig
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Proposal: close the PyPI file-replacement loophole

Jim Fulton
In reply to this post by Richard Jones-7
On Sun, Jan 29, 2012 at 6:47 PM, Richard Jones <[hidden email]> wrote:

> Hi catalog-sig,
>
> When we initially implemented file upload to PyPI it was our intention
> that the file be immutable once uploaded. The goal was to make things
> significantly simpler for end users - there would only ever be one
> file with a given name. If the content changed then so must the name
> (typically by creating a new release version.)
>
> After the upload facility was put in place we also added the ability
> to delete files uploaded to pypi. This created a loophole: if a
> package owner knew how to they could delete the file and re-upload,
> thus circumventing the replacement protection.
>
> I'm considering closing this loophole by retaining a record of the
> uploaded file (though not the contents) so that future uploads with
> the same name wouldn't be allowed. I understand that this is how the
> ruby gem archive handles deletion of files.

+1

Jim

--
Jim Fulton
http://www.linkedin.com/in/jimfulton
_______________________________________________
Catalog-SIG mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/catalog-sig
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Proposal: close the PyPI file-replacement loophole

Tarek Ziadé
In reply to this post by M.-A. Lemburg


On Mon, Jan 30, 2012 at 1:46 AM, M.-A. Lemburg <[hidden email]> wrote:
Donald Stufft wrote:
>
>
> On Monday, January 30, 2012 at 4:23 AM, M.-A. Lemburg wrote:
>
>> Richard Jones wrote:
>>> Hi catalog-sig,
>>>
>>> When we initially implemented file upload to PyPI it was our intention
>>> that the file be immutable once uploaded. The goal was to make things
>>> significantly simpler for end users - there would only ever be one
>>> file with a given name. If the content changed then so must the name
>>> (typically by creating a new release version.)
>>>
>>> After the upload facility was put in place we also added the ability
>>> to delete files uploaded to pypi. This created a loophole: if a
>>> package owner knew how to they could delete the file and re-upload,
>>> thus circumventing the replacement protection.
>>>
>>> I'm considering closing this loophole by retaining a record of the
>>> uploaded file (though not the contents) so that future uploads with
>>> the same name wouldn't be allowed. I understand that this is how the
>>> ruby gem archive handles deletion of files.
>>>
>>> Your thoughts?
>>
>> I don't think that's a good idea, since it would require the
>> package author to issue a new release whenever something goes wrong
>> with an upload (e.g. missing files, corrupted archive, etc.).
>>
>> Please leave the existing logic in place.
> And version numbers are a scarce resource?

No, but having to kick off the whole release process again
just because something went wrong when uploading release files
to PyPI causes plenty of trouble.

It's the opposite that gets you into trouble: once you have uploaded something at PyPI, it potentially gets copied to mirrors.

The mirror protocol, as far as I remember, does not deal with 'updates of existing files'

IOW if the release is broken and you fix it in pypi, it might stay broken in mirrorsand the inconsistent state is much more trouble.

so +1 for creating a new release whatever state the previous published one is in - release numbers are not expensive

what about adding a metadata flag to releases  ? e.g. "deprecated" - that way client tools know they need to avoid this one
and developers can change the flag

 



> (Even though I believe it would be acceptable to cover that particular use case by giving a grace period of when you can re upload).

Can't we just leave dealing with that problem to the package authors ?
It's their responsibility, not PyPI's.

--
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Jan 30 2012)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::


  eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
   D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
          Registered at Amtsgericht Duesseldorf: HRB 46611
              http://www.egenix.com/company/contact/
_______________________________________________
Catalog-SIG mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/catalog-sig



--
Tarek Ziadé | http://ziade.org

_______________________________________________
Catalog-SIG mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/catalog-sig
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Proposal: close the PyPI file-replacement loophole

M.-A. Lemburg
Tarek Ziadé wrote:

> On Mon, Jan 30, 2012 at 1:46 AM, M.-A. Lemburg <[hidden email]> wrote:
>
>> Donald Stufft wrote:
>>>
>>>
>>> On Monday, January 30, 2012 at 4:23 AM, M.-A. Lemburg wrote:
>>>
>>>> Richard Jones wrote:
>>>>> Hi catalog-sig,
>>>>>
>>>>> When we initially implemented file upload to PyPI it was our intention
>>>>> that the file be immutable once uploaded. The goal was to make things
>>>>> significantly simpler for end users - there would only ever be one
>>>>> file with a given name. If the content changed then so must the name
>>>>> (typically by creating a new release version.)
>>>>>
>>>>> After the upload facility was put in place we also added the ability
>>>>> to delete files uploaded to pypi. This created a loophole: if a
>>>>> package owner knew how to they could delete the file and re-upload,
>>>>> thus circumventing the replacement protection.
>>>>>
>>>>> I'm considering closing this loophole by retaining a record of the
>>>>> uploaded file (though not the contents) so that future uploads with
>>>>> the same name wouldn't be allowed. I understand that this is how the
>>>>> ruby gem archive handles deletion of files.
>>>>>
>>>>> Your thoughts?
>>>>
>>>> I don't think that's a good idea, since it would require the
>>>> package author to issue a new release whenever something goes wrong
>>>> with an upload (e.g. missing files, corrupted archive, etc.).
>>>>
>>>> Please leave the existing logic in place.
>>> And version numbers are a scarce resource?
>>
>> No, but having to kick off the whole release process again
>> just because something went wrong when uploading release files
>> to PyPI causes plenty of trouble.
>>
>
> It's the opposite that gets you into trouble: once you have uploaded
> something at PyPI, it potentially gets copied to mirrors.
>
> The mirror protocol, as far as I remember, does not deal with 'updates of
> existing files'
>
> IOW if the release is broken and you fix it in pypi, it might stay broken
> in mirrorsand the inconsistent state is much more trouble.

That shouldn't be a problem if the mirrors correctly implements
the protocol:

PEP 381:
"""
Mirrors must reduce the amount of data transfered between the central server and the mirror. To
achieve that, they MUST use the changelog() PyPI XML-RPC call, and only refetch the packages that
have been changed since the last time. For each package P, they MUST copy documents /simple/P/ and
/serversig/P. If a package is deleted on the central server, they MUST delete the package and all
associated files. To detect modification of package files, they MAY cache the file's ETag, and MAY
request skipping it using the If-none-match header.
"""

Note that the whole package (including all files) is refetched
whenever something changes. The only allowed optimization is
to look at the file's modification date, which will be different
for newly uploaded files of the same name.

See Martin's pep381client for details:

https://bitbucket.org/loewis/pep381client/src/9d55326ed555/pep381client/__init__.py

--
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Jan 30 2012)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/
_______________________________________________
Catalog-SIG mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/catalog-sig
12345
Loading...