|
Hey,
I find the actual downloads hits to be quite artificial because there are some build systems out there that are fetching releases all day long for their work. There are local mirrors of course, but I am pretty sure projects like zc.buildout are downloaded most of the times by build scripts. And setuptools is downloaded mostly as a dependency of other projects. Those are valid stats of course, but I was wondering if we could provide more details in why the package was downloaded. e.g. if we're able to distinguish automated downloads from other downloads. One way I was thinking of was to tell PyPI at download time if the download was done as a dependency fetching or was a primary download (manuall download or "pip install xxx') Another way would be to ask Continuous Integration systems to use a specific user agent marker. In the UI we could then make the distinction in the download hits between: 1/ downloads by the end users to install the project 2/ downloads by build tools. 3/ "indirect" downloads as dependencies This is still a bit vague in my head, but I think it would be valuable for people to have such details Cheers Tarek -- Tarek Ziadé | http://ziade.org _______________________________________________ Catalog-SIG mailing list [hidden email] http://mail.python.org/mailman/listinfo/catalog-sig |
|
On Tue, Mar 22, 2011 at 5:59 AM, Tarek Ziadé <[hidden email]> wrote:
> Hey, > > I find the actual downloads hits to be quite artificial because there > are some build systems out there that are fetching releases all day > long for their work. There are local mirrors of course, Not just local mirrors, but source releases that include things, download caches, etc... > but I am > pretty sure projects like zc.buildout are downloaded most of the times > by build scripts. And setuptools is downloaded mostly as a dependency > of other projects. > > Those are valid stats of course, but I was wondering if we could > provide more details in why the package was downloaded. e.g. if we're > able to distinguish automated downloads from other downloads. > > One way I was thinking of was to tell PyPI at download time if the > download was done as a dependency fetching or was a primary download > (manuall download or "pip install xxx') I don't know why downloading something as part of a buildout would be any different that doing a "pip install". I almost never download anything except with buildout. > Another way would be to ask Continuous Integration systems to use a > specific user agent marker. > > In the UI we could then make the distinction in the download hits between: > > 1/ downloads by the end users to install the project > 2/ downloads by build tools. > 3/ "indirect" downloads as dependencies > > This is still a bit vague in my head, but I think it would be valuable > for people to have such details I think it would help to ask what the goals of the statistics are? The statistics are presumably used to answer some questions. What are those questions? Jim -- Jim Fulton http://www.linkedin.com/in/jimfulton _______________________________________________ Catalog-SIG mailing list [hidden email] http://mail.python.org/mailman/listinfo/catalog-sig |
|
On Tue, Mar 22, 2011 at 11:33 AM, Jim Fulton <[hidden email]> wrote:
... > > I don't know why downloading something as part of a buildout would be any > different that doing a "pip install". I almost never download anything except > with buildout. Because when you are running a buildout to install a Plone, what you are really doing is "installing Plone" -- downloading setuptools within this process is just something the build tool does to work, and does not necessarily means the final app uses it. A fresh buildout call w/ the bootstrap script == one hit to zc.buildout + one hit to setuptools When you do an explicit "pip install XXX" you are installing XXX as an end-user. > > >> Another way would be to ask Continuous Integration systems to use a >> specific user agent marker. >> >> In the UI we could then make the distinction in the download hits between: >> >> 1/ downloads by the end users to install the project >> 2/ downloads by build tools. >> 3/ "indirect" downloads as dependencies >> >> This is still a bit vague in my head, but I think it would be valuable >> for people to have such details > > I think it would help to ask what the goals of the statistics are? > The statistics are presumably used to answer some questions. What are > those questions? A/ is my project that provides end users script but also modules that can be reused by other apps, is: 1/ being installed by end users explicitly via easy_install, pip or a direct distutils install 2/ just pulled as a dependency for another project B/ does the 126543265423 download hits I get for my project were done by automated build scripts or for installations ? C/ how can we differentiate the "end users" projects in PyPI, as opposed to build tools like zc.buildout or setuptools D/ Which projects are the ones my project is mostly downloaded for as a dependency ? > Jim > > -- > Jim Fulton > http://www.linkedin.com/in/jimfulton > -- Tarek Ziadé | http://ziade.org _______________________________________________ Catalog-SIG mailing list [hidden email] http://mail.python.org/mailman/listinfo/catalog-sig |
|
Tarek Ziadé wrote:
> On Tue, Mar 22, 2011 at 11:33 AM, Jim Fulton <[hidden email]> wrote: ... >> I think it would help to ask what the goals of the statistics are? The >> statistics are presumably used to answer some questions. What are those >> questions? > > A/ is my project that provides end users script but also modules that can > be reused by other apps, is: > > 1/ being installed by end users explicitly via easy_install, pip or a > direct distutils install > 2/ just pulled as a dependency for another project These questions are already too technical for judging how the stats should be computed. I think what consumers of the stats actually want to know and what the stats therefore need to be able to answer in the end is more along the lines of: - Has my project ever been used by other people? Is it worth my time to make a nice distribution of it? - Is my project still being used? How many people get mad at me if I make incompatible changes? - How many hits does my project get compared with "the competition"? What's my "market share"? Am I cool? ;o) Not that I'd find that stuff overly interesting myself, but unless there's a really good reason to add more details to the stats, I'd strongly prefer the interaction with PyPI to remain as simple as possible. -- Thomas _______________________________________________ Catalog-SIG mailing list [hidden email] http://mail.python.org/mailman/listinfo/catalog-sig |
|
In reply to this post by Tarek Ziadé
On Tue, Mar 22, 2011 at 11:58, Tarek Ziadé <[hidden email]> wrote:
>> I don't know why downloading something as part of a buildout would be any >> different that doing a "pip install". I almost never download anything except >> with buildout. > > Because when you are running a buildout to install a Plone, what you > are really doing is "installing Plone" -- downloading setuptools > within this process is just something the build tool does to work, and > does not necessarily means the final app uses it. Hi Tarek, from my point of view, this question is answered with categories: - Plone is an application = downloaded for itself - Python libs = dependency - Buildout or setuptools = build tools -- Sebastien Douche <[hidden email]> Twitter: @sdouche (agile, lean, python, git, open source) _______________________________________________ Catalog-SIG mailing list [hidden email] http://mail.python.org/mailman/listinfo/catalog-sig |
|
In reply to this post by Thomas Lotze-2
On 23/03/2011 06:22, Thomas Lotze wrote:
> Tarek Ziadé wrote: > >> On Tue, Mar 22, 2011 at 11:33 AM, Jim Fulton<[hidden email]> wrote: ... >>> I think it would help to ask what the goals of the statistics are? The >>> statistics are presumably used to answer some questions. What are those >>> questions? Having a user agent defined in the clients connecting PyPI could also allow to make statistics on the usage of such tools (xx% of all the downloads on pypi.python.org are made by buildout, by the distutils2 index crawler, by pip etc.) I'm +1 on having CI tools using specific HTTP headers in order to avoid using those information as "user downloads". We can probably store this information in a different place and display clearly what the number of downloads for CI tools is on pypi.py.org -- Alexis — http://notmyidea.org _______________________________________________ Catalog-SIG mailing list [hidden email] http://mail.python.org/mailman/listinfo/catalog-sig |
|
2011/3/27 Alexis Métaireau <[hidden email]>:
> Having a user agent defined in the clients connecting PyPI could also allow > to make statistics on the usage of such tools (xx% of all the downloads on > pypi.python.org are made by buildout, by the distutils2 index crawler, by > pip etc.) I don't object to this, but I don't know that it tells you anything. It is better than everything showing up as a module from the Python standard library, which clearly doesn't tell us much. > I'm +1 on having CI tools using specific HTTP headers in order to avoid > using those information as "user downloads". We can probably store this > information in a different place and display clearly what the number of > downloads for CI tools is on pypi.py.org I'd be surprised if many CI tools did a lot of downloading; the build tools are typically responsible for that. I'd expect them to show up a lot. More importantly, I'm not sure what you mean by "user downloads". If I cause my build tool to download a package from PyPI, whether once or a thousand times, that still seems like a user download to me. (zc.buildout, at least, supports an effective caching strategy, so I doubt the package download numbers would change all that much.) If I want to try out a package for some purpose, I'm going to add it to the build of the project I expect it to be useful for; if it proves insufficiently useful, I'll remove it. My point is that if you don't include the downloads from zc.buildout, or whatever tool someone is using, you're likely to miss them completely, because what's *not* happening is a browser-based download. I can't even remember the last time I've done that for something available via PyPI; it's been many years. -Fred -- Fred L. Drake, Jr. <fdrake at acm.org> "A storm broke loose in my mind." --Albert Einstein _______________________________________________ Catalog-SIG mailing list [hidden email] http://mail.python.org/mailman/listinfo/catalog-sig |
|
On 27/03/2011 16:01, Fred Drake wrote:
> I don't object to this, but I don't know that it tells you anything. > It is better than everything showing up as a module from the Python > standard library, which clearly doesn't tell us much. Are you talking about the fact that the client code will be provided by packaging.index ? If so, I guess we can think about a mechanism for 3rd parties to define what the type of client originate the request. > I'd be surprised if many CI tools did a lot of downloading; the build > tools are typically responsible for that. I'd expect them to show up > a lot. True. > More importantly, I'm not sure what you mean by "user downloads". If > I cause my build tool to download a package from PyPI, whether once or > a thousand times, that still seems like a user download to me. > (zc.buildout, at least, supports an effective caching strategy, so I > doubt the package download numbers would change all that much.) I was thinking about the opposition between automated uses (build tools) and casual uses (a deployment). But given it more thoughts, it seems to be difficult to isolate one from the other (would appreciate any ideas about that) > If I want to try out a package for some purpose, I'm going to add it > to the build of the project I expect it to be useful for; if it proves > insufficiently useful, I'll remove it. > > My point is that if you don't include the downloads from zc.buildout, > or whatever tool someone is using, you're likely to miss them > completely, because what's *not* happening is a browser-based > download. I can't even remember the last time I've done that for > something available via PyPI; it's been many years. That's true: browser-based downloads are a real minority of the use cases. If we want to have a way to distinguish between "normal use", and repeated use (which can "false" the statistics), we need the build tools (or whatever does the download repeatedly) to declare themselves as such, right ? -- Alexis — http://notmyidea.org _______________________________________________ Catalog-SIG mailing list [hidden email] http://mail.python.org/mailman/listinfo/catalog-sig |
| Powered by Nabble | Edit this page |
