Quantcast

Thoughts on more detailed stats

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Thoughts on more detailed stats

Tarek Ziadé
Hey,

I find the actual downloads hits to be quite artificial because there
are some build systems out there that are fetching releases all day
long for their work. There are local mirrors of course, but I am
pretty sure projects like zc.buildout are downloaded most of the times
by build scripts. And setuptools is downloaded mostly as a dependency
of other projects.

Those are valid stats of course, but I was wondering if we could
provide more details in why the package was downloaded. e.g. if we're
able to distinguish automated downloads from other downloads.

One way I was thinking of was to tell PyPI at download time if the
download was done as a dependency fetching or was a primary download
(manuall download or "pip install xxx')

Another way would be to ask Continuous Integration systems to use a
specific user agent marker.

In the UI we could then make the distinction in the download hits between:

1/ downloads by the end users to install the project
2/ downloads by build tools.
3/ "indirect" downloads as dependencies

This is still a bit vague in my head, but I think it would be valuable
for people to have such details

Cheers
Tarek

--
Tarek Ziadé | http://ziade.org
_______________________________________________
Catalog-SIG mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/catalog-sig
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Thoughts on more detailed stats

Jim Fulton
On Tue, Mar 22, 2011 at 5:59 AM, Tarek Ziadé <[hidden email]> wrote:
> Hey,
>
> I find the actual downloads hits to be quite artificial because there
> are some build systems out there that are fetching releases all day
> long for their work. There are local mirrors of course,

Not just local mirrors, but source releases that include things, download
caches, etc...

> but I am
> pretty sure projects like zc.buildout are downloaded most of the times
> by build scripts. And setuptools is downloaded mostly as a dependency
> of other projects.
>
> Those are valid stats of course, but I was wondering if we could
> provide more details in why the package was downloaded. e.g. if we're
> able to distinguish automated downloads from other downloads.
>
> One way I was thinking of was to tell PyPI at download time if the
> download was done as a dependency fetching or was a primary download
> (manuall download or "pip install xxx')

I don't know why downloading something as part of a buildout would be any
different that doing a "pip install".  I almost never download anything except
with buildout.


> Another way would be to ask Continuous Integration systems to use a
> specific user agent marker.
>
> In the UI we could then make the distinction in the download hits between:
>
> 1/ downloads by the end users to install the project
> 2/ downloads by build tools.
> 3/ "indirect" downloads as dependencies
>
> This is still a bit vague in my head, but I think it would be valuable
> for people to have such details

I think it would help to ask what the goals of the statistics are?
The statistics are presumably used to answer some questions. What are
those questions?

Jim

--
Jim Fulton
http://www.linkedin.com/in/jimfulton
_______________________________________________
Catalog-SIG mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/catalog-sig
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Thoughts on more detailed stats

Tarek Ziadé
On Tue, Mar 22, 2011 at 11:33 AM, Jim Fulton <[hidden email]> wrote:
...
>
> I don't know why downloading something as part of a buildout would be any
> different that doing a "pip install".  I almost never download anything except
> with buildout.

Because when you are running a buildout to install a Plone, what you
are really doing is "installing Plone" -- downloading setuptools
within this process is just something the build tool does to work, and
does not necessarily means the final app uses it.

A fresh buildout call w/ the bootstrap script == one hit to
zc.buildout + one hit to setuptools

When you do an explicit "pip install XXX" you are installing XXX as an end-user.


>
>
>> Another way would be to ask Continuous Integration systems to use a
>> specific user agent marker.
>>
>> In the UI we could then make the distinction in the download hits between:
>>
>> 1/ downloads by the end users to install the project
>> 2/ downloads by build tools.
>> 3/ "indirect" downloads as dependencies
>>
>> This is still a bit vague in my head, but I think it would be valuable
>> for people to have such details
>
> I think it would help to ask what the goals of the statistics are?
> The statistics are presumably used to answer some questions. What are
> those questions?

A/ is my project that provides end users script but also modules that
can be reused by other apps, is:

1/ being installed by end users explicitly via easy_install, pip or a
direct distutils install
2/ just pulled as a dependency for another project

B/ does the 126543265423 download hits I get for my project were done
by automated build scripts or for installations ?

C/ how can we differentiate the "end users" projects in PyPI, as
opposed to build tools like zc.buildout or setuptools

D/ Which projects are the ones my project is mostly downloaded for as
a dependency ?



> Jim
>
> --
> Jim Fulton
> http://www.linkedin.com/in/jimfulton
>



--
Tarek Ziadé | http://ziade.org
_______________________________________________
Catalog-SIG mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/catalog-sig
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Thoughts on more detailed stats

Thomas Lotze-2
Tarek Ziadé wrote:

> On Tue, Mar 22, 2011 at 11:33 AM, Jim Fulton <[hidden email]> wrote: ...
>> I think it would help to ask what the goals of the statistics are? The
>> statistics are presumably used to answer some questions. What are those
>> questions?
>
> A/ is my project that provides end users script but also modules that can
> be reused by other apps, is:
>
> 1/ being installed by end users explicitly via easy_install, pip or a
> direct distutils install
> 2/ just pulled as a dependency for another project
[...]

These questions are already too technical for judging how the stats should
be computed. I think what consumers of the stats actually want to know and
what the stats therefore need to be able to answer in the end is more
along the lines of:

- Has my project ever been used by other people? Is it worth my time to
  make a nice distribution of it?

- Is my project still being used? How many people get mad at me if I make
  incompatible changes?

- How many hits does my project get compared with "the competition"?
  What's my "market share"? Am I cool? ;o)

Not that I'd find that stuff overly interesting myself, but unless there's
a really good reason to add more details to the stats, I'd strongly prefer
the interaction with PyPI to remain as simple as possible.

--
Thomas



_______________________________________________
Catalog-SIG mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/catalog-sig
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Thoughts on more detailed stats

Sebastien Douche
In reply to this post by Tarek Ziadé
On Tue, Mar 22, 2011 at 11:58, Tarek Ziadé <[hidden email]> wrote:

>> I don't know why downloading something as part of a buildout would be any
>> different that doing a "pip install".  I almost never download anything except
>> with buildout.
>
> Because when you are running a buildout to install a Plone, what you
> are really doing is "installing Plone" -- downloading setuptools
> within this process is just something the build tool does to work, and
> does not necessarily means the final app uses it.

Hi Tarek,
from my point of view, this question is answered with categories:
- Plone is an application = downloaded for itself
- Python libs = dependency
- Buildout or setuptools = build tools



--
Sebastien Douche <[hidden email]>
Twitter: @sdouche (agile, lean, python, git, open source)
_______________________________________________
Catalog-SIG mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/catalog-sig
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Thoughts on more detailed stats

Alexis Métaireau-2
In reply to this post by Thomas Lotze-2
On 23/03/2011 06:22, Thomas Lotze wrote:
> Tarek Ziadé wrote:
>
>> On Tue, Mar 22, 2011 at 11:33 AM, Jim Fulton<[hidden email]>  wrote: ...
>>> I think it would help to ask what the goals of the statistics are? The
>>> statistics are presumably used to answer some questions. What are those
>>> questions?

Having a user agent defined in the clients connecting PyPI could also
allow to make statistics on the usage of such tools (xx% of all the
downloads on pypi.python.org are made by buildout, by the distutils2
index crawler, by pip etc.)

I'm +1 on having CI tools using specific HTTP headers in order to avoid
using those information as "user downloads". We can probably store this
information in a different place and display clearly what the number of
downloads for CI tools is on pypi.py.org

--
Alexis — http://notmyidea.org
_______________________________________________
Catalog-SIG mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/catalog-sig
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Thoughts on more detailed stats

Fred Drake-3
2011/3/27 Alexis Métaireau <[hidden email]>:
> Having a user agent defined in the clients connecting PyPI could also allow
> to make statistics on the usage of such tools (xx% of all the downloads on
> pypi.python.org are made by buildout, by the distutils2 index crawler, by
> pip etc.)

I don't object to this, but I don't know that it tells you anything.
It is better than everything showing up as a module from the Python
standard library, which clearly doesn't tell us much.

> I'm +1 on having CI tools using specific HTTP headers in order to avoid
> using those information as "user downloads". We can probably store this
> information in a different place and display clearly what the number of
> downloads for CI tools is on pypi.py.org

I'd be surprised if many CI tools did a lot of downloading; the build
tools are typically responsible for that.  I'd expect them to show up
a lot.

More importantly, I'm not sure what you mean by "user downloads".  If
I cause my build tool to download a package from PyPI, whether once or
a thousand times, that still seems like a user download to me.
(zc.buildout, at least, supports an effective caching strategy, so I
doubt the package download numbers would change all that much.)

If I want to try out a package for some purpose, I'm going to add it
to the build of the project I expect it to be useful for; if it proves
insufficiently useful, I'll remove it.

My point is that if you don't include the downloads from zc.buildout,
or whatever tool someone is using, you're likely to miss them
completely, because what's *not* happening is a browser-based
download.  I can't even remember the last time I've done that for
something available via PyPI; it's been many years.


  -Fred

--
Fred L. Drake, Jr.    <fdrake at acm.org>
"A storm broke loose in my mind."  --Albert Einstein
_______________________________________________
Catalog-SIG mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/catalog-sig
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Thoughts on more detailed stats

Alexis Métaireau-2
On 27/03/2011 16:01, Fred Drake wrote:

> I don't object to this, but I don't know that it tells you anything.
> It is better than everything showing up as a module from the Python
> standard library, which clearly doesn't tell us much.

Are you talking about the fact that the client code will be provided by
packaging.index ? If so, I guess we can think about a mechanism for 3rd
parties to define what the type of client originate the request.

> I'd be surprised if many CI tools did a lot of downloading; the build
> tools are typically responsible for that.  I'd expect them to show up
> a lot.
True.

> More importantly, I'm not sure what you mean by "user downloads".  If
> I cause my build tool to download a package from PyPI, whether once or
> a thousand times, that still seems like a user download to me.
> (zc.buildout, at least, supports an effective caching strategy, so I
> doubt the package download numbers would change all that much.)

I was thinking about the opposition between automated uses (build tools)
and casual uses (a deployment). But given it more thoughts, it seems to
be difficult to isolate one from the other (would appreciate any ideas
about that)

> If I want to try out a package for some purpose, I'm going to add it
> to the build of the project I expect it to be useful for; if it proves
> insufficiently useful, I'll remove it.
>
> My point is that if you don't include the downloads from zc.buildout,
> or whatever tool someone is using, you're likely to miss them
> completely, because what's *not* happening is a browser-based
> download.  I can't even remember the last time I've done that for
> something available via PyPI; it's been many years.

That's true: browser-based downloads are a real minority of the use cases.

If we want to have a way to distinguish between "normal use", and
repeated use (which can "false" the statistics), we need the build tools
(or whatever does the download repeatedly) to declare themselves as
such, right ?
--
Alexis — http://notmyidea.org
_______________________________________________
Catalog-SIG mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/catalog-sig
Loading...