I do not have access to the right _hierarchy.py source file

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

I do not have access to the right _hierarchy.py source file

pegah Aliz
Hello Everybody,

This question seems simple, but I can't find the solution:

I use scipy.cluster.hierarchy to do a hierarchical clustering on a set of points using "cosine" similarity metric. As an example, I have:


import scipy.cluster.hierarchy as hac
import matplotlib.pyplot as plt

Points =
  np.array([[ 0.         , 0.23508573],
 [ 0.00754775 , 0.26717266],
 [ 0.00595464 , 0.27775905],
 [ 0.01220563 , 0.23622067],
 [ 0.00542628 , 0.14185873],
 [ 0.03078922 , 0.11273108],
 [ 0.06707743 ,-0.1061131 ],
 [ 0.04411757 ,-0.10775407],
 [ 0.01349434 , 0.00112159],
 [ 0.04066034 , 0.11639591],
 [ 0.         , 0.29046682],
 [ 0.07338036 , 0.00609912],
 [ 0.01864988 , 0.0316196 ],
 [ 0.         , 0.07270636],
 [ 0.         ,  0.        ]])


z = hac.linkage(Points, metric='cosine', method='complete')
labels = hac.fcluster(z, 0.1, criterion="distance")


plt.scatter(Points[:, 0], Points[:, 1], c=labels.astype(np.float))
plt.show()


Since I use cosine metric, in some cases the dot product of two vectors can be negative or norm of some vectors can be zero. It means z output will have some negative or infinite elements which is not valid for fcluster (as below):

z =
[[  0.00000000e+00   1.00000000e+01   0.00000000e+00   2.00000000e+00]
 [  1.30000000e+01   1.50000000e+01   0.00000000e+00   3.00000000e+00]
 [  8.00000000e+00   1.10000000e+01   4.26658708e-13   2.00000000e+00]
 [  1.00000000e+00   2.00000000e+00   2.31748880e-05   2.00000000e+00]
 [  3.00000000e+00   4.00000000e+00   8.96700489e-05   2.00000000e+00]
 [  1.60000000e+01   1.80000000e+01   3.98805492e-04   5.00000000e+00]
 [  1.90000000e+01   2.00000000e+01   1.33225099e-03   7.00000000e+00]
 [  5.00000000e+00   9.00000000e+00   2.41120340e-03   2.00000000e+00]
 [  6.00000000e+00   7.00000000e+00   1.52914684e-02   2.00000000e+00]
 [  1.20000000e+01   2.20000000e+01   3.52441432e-02   3.00000000e+00]
 [  2.10000000e+01   2.40000000e+01   1.38662986e-01   1.00000000e+01]
 [  1.70000000e+01   2.30000000e+01   6.99056531e-01   4.00000000e+00]
 [  2.50000000e+01   2.60000000e+01   1.92543748e+00   1.40000000e+01]
 [ -1.00000000e+00   2.70000000e+01              inf   1.50000000e+01]]

To solve this problem, I checked linkage() function and inside it I needed to check _hierarchy.linkage() method. I use pycharm text editor and when I asked for "linkage" source code, it opened up a python file namely "_hierarchy.py" inside the directory like the following:

.PyCharm40/system/python_stubs/-1247972723/scipy/cluster/_hierarchy.py
 
This python file doesn't have any definition for all included functions.  
I am wondering what is the correct source of this function to revise it and solve my problem.
I would be appreciated if someone helps me to explore the correct source.

Thanks and Regards
Pegah


Reply | Threaded
Open this post in threaded view
|

I do not have access to the right _hierarchy.py source file

Zachary Ware-2
On May 17, 2015 11:20 AM, "pegah Aliz" <pegah.alizadeh at gmail.com> wrote:
>
> Hello Everybody,
>
> This question seems simple, but I can't find the solution:

<snip>

> To solve this problem, I checked linkage() function and inside it I
needed to check _hierarchy.linkage() method. I use pycharm text editor and
when I asked for "linkage" source code, it opened up a python file namely
"_hierarchy.py" inside the directory like the following:
>
> .PyCharm40/system/python_stubs/-1247972723/scipy/cluster/_hierarchy.py
>
> This python file doesn't have any definition for all included functions.
> I am wondering what is the correct source of this function to revise it
and solve my problem.
> I would be appreciated if someone helps me to explore the correct source.

What you're seeing is basically an implementation detail of PyCharm: they
include "stub" files for many standard library and third party packages
that have C source in order to provide parameter completion and other
features. Chances are very good that you actually need '_hierarchy.c' from
somewhere in SciPy's source; unfortunately I can't help you beyond telling
you that.

Hope this helps,
--
Zach
(On a phone)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20150517/8692fac4/attachment.html>

Reply | Threaded
Open this post in threaded view
|

I do not have access to the right _hierarchy.py source file

Gary Herron-2
In reply to this post by pegah Aliz
On 05/17/2015 09:18 AM, pegah Aliz wrote:

...

> To solve this problem, I checked linkage() function and inside it I needed to check _hierarchy.linkage() method. I use pycharm text editor and when I asked for "linkage" source code, it opened up a python file namely "_hierarchy.py" inside the directory like the following:
>
> .PyCharm40/system/python_stubs/-1247972723/scipy/cluster/_hierarchy.py
>  
> This python file doesn't have any definition for all included functions.
> I am wondering what is the correct source of this function to revise it and solve my problem.
> I would be appreciated if someone helps me to explore the correct source.
>
> Thanks and Regards
> Pegah

Please tell us:

  * What platform you are on;  Linux, Windows, ...
  * How you installed PyCharm
  * What contents that file currently has
  * Why you think it's incorrect.

I think it's far more likely that that file is correct, and you are
somehow misinterpreting its contents, but we can't even begin to guess
until you show us its current content.

Gary

--
Dr. Gary Herron
Department of Computer Science
DigiPen Institute of Technology
(425) 895-4418

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20150517/992488cc/attachment.html>

Reply | Threaded
Open this post in threaded view
|

I do not have access to the right _hierarchy.py source file

Peter Otten
In reply to this post by pegah Aliz
pegah Aliz wrote:

> To solve this problem, I checked linkage() function and inside it I needed
> to check _hierarchy.linkage() method. I use pycharm text editor and when I
> asked for "linkage" source code, it opened up a python file namely
> "_hierarchy.py" inside the directory like the following:
>
> .PyCharm40/system/python_stubs/-1247972723/scipy/cluster/_hierarchy.py
>
> This python file doesn't have any definition for all included functions.
> I am wondering what is the correct source of this function to revise it
> and solve my problem. I would be appreciated if someone helps me to
> explore the correct source.

The actual _hierarchy module is probably written in C. Before you turn to
hacking that you should ask for advice on a scipy news group or mailing
list. Perhaps the experts can point you to a better way to approach the
underlying problem.




Reply | Threaded
Open this post in threaded view
|

I do not have access to the right _hierarchy.py source file

pegah Aliz
In reply to this post by pegah Aliz
On Sunday, May 17, 2015 at 6:18:51 PM UTC+2, pegah Aliz wrote:

> Hello Everybody,
>
> This question seems simple, but I can't find the solution:
>
> I use scipy.cluster.hierarchy to do a hierarchical clustering on a set of points using "cosine" similarity metric. As an example, I have:
>
>
> import scipy.cluster.hierarchy as hac
> import matplotlib.pyplot as plt
>
> Points =
>   np.array([[ 0.         , 0.23508573],
>  [ 0.00754775 , 0.26717266],
>  [ 0.00595464 , 0.27775905],
>  [ 0.01220563 , 0.23622067],
>  [ 0.00542628 , 0.14185873],
>  [ 0.03078922 , 0.11273108],
>  [ 0.06707743 ,-0.1061131 ],
>  [ 0.04411757 ,-0.10775407],
>  [ 0.01349434 , 0.00112159],
>  [ 0.04066034 , 0.11639591],
>  [ 0.         , 0.29046682],
>  [ 0.07338036 , 0.00609912],
>  [ 0.01864988 , 0.0316196 ],
>  [ 0.         , 0.07270636],
>  [ 0.         ,  0.        ]])
>
>
> z = hac.linkage(Points, metric='cosine', method='complete')
> labels = hac.fcluster(z, 0.1, criterion="distance")
>
>
> plt.scatter(Points[:, 0], Points[:, 1], c=labels.astype(np.float))
> plt.show()
>
>
> Since I use cosine metric, in some cases the dot product of two vectors can be negative or norm of some vectors can be zero. It means z output will have some negative or infinite elements which is not valid for fcluster (as below):
>
> z =
> [[  0.00000000e+00   1.00000000e+01   0.00000000e+00   2.00000000e+00]
>  [  1.30000000e+01   1.50000000e+01   0.00000000e+00   3.00000000e+00]
>  [  8.00000000e+00   1.10000000e+01   4.26658708e-13   2.00000000e+00]
>  [  1.00000000e+00   2.00000000e+00   2.31748880e-05   2.00000000e+00]
>  [  3.00000000e+00   4.00000000e+00   8.96700489e-05   2.00000000e+00]
>  [  1.60000000e+01   1.80000000e+01   3.98805492e-04   5.00000000e+00]
>  [  1.90000000e+01   2.00000000e+01   1.33225099e-03   7.00000000e+00]
>  [  5.00000000e+00   9.00000000e+00   2.41120340e-03   2.00000000e+00]
>  [  6.00000000e+00   7.00000000e+00   1.52914684e-02   2.00000000e+00]
>  [  1.20000000e+01   2.20000000e+01   3.52441432e-02   3.00000000e+00]
>  [  2.10000000e+01   2.40000000e+01   1.38662986e-01   1.00000000e+01]
>  [  1.70000000e+01   2.30000000e+01   6.99056531e-01   4.00000000e+00]
>  [  2.50000000e+01   2.60000000e+01   1.92543748e+00   1.40000000e+01]
>  [ -1.00000000e+00   2.70000000e+01              inf   1.50000000e+01]]
>
> To solve this problem, I checked linkage() function and inside it I needed to check _hierarchy.linkage() method. I use pycharm text editor and when I asked for "linkage" source code, it opened up a python file namely "_hierarchy.py" inside the directory like the following:
>
> .PyCharm40/system/python_stubs/-1247972723/scipy/cluster/_hierarchy.py
>  
> This python file doesn't have any definition for all included functions.  
> I am wondering what is the correct source of this function to revise it and solve my problem.
> I would be appreciated if someone helps me to explore the correct source.
>
> Thanks and Regards
> Pegah



1 - The platform is Linux
2 - After downloading .tar file, making file and configuring, I use pycharm.sh
3 - these are contents of _hierarchy.py :

# encoding: utf-8
# module scipy.cluster._hierarchy
# from /users/alizadeh/.local/lib/python2.7/site-packages/scipy/cluster/_hierarchy.so
# by generator 1.136
# no doc

# imports
import __builtin__ as __builtins__ # <module '__builtin__' (built-in)>
import numpy as np # /usr/lib/pymodules/python2.7/numpy/__init__.pyc

# functions

def calculate_cluster_sizes(*args, **kwargs): # real signature unknown
    """
    Calculate the size of each cluster. The result is the fourth column of
        the linkage matrix.
   
        Parameters
        ----------
        Z : ndarray
            The linkage matrix. The fourth column can be empty.
        cs : ndarray
            The array to store the sizes.
        n : ndarray
            The number of observations.
    """
    pass

def cluster_dist(*args, **kwargs): # real signature unknown
    """
    Form flat clusters by distance criterion.
   
        Parameters
        ----------
        Z : ndarray
            The linkage matrix.
        T : ndarray
            The array to store the cluster numbers. The i'th observation belongs to
            cluster `T[i]`.
        cutoff : double
            Clusters are formed when distances are less than or equal to `cutoff`.
        n : int
            The number of observations.
    """
    pass

def cluster_in(*args, **kwargs): # real signature unknown
    """
    Form flat clusters by inconsistent criterion.
   
        Parameters
        ----------
        Z : ndarray
            The linkage matrix.
        R : ndarray
            The inconsistent matrix.
        T : ndarray
            The array to store the cluster numbers. The i'th observation belongs to
            cluster `T[i]`.
        cutoff : double
            Clusters are formed when the inconsistent values are less than or
            or equal to `cutoff`.
        n : int
            The number of observations.
    """
    pass

def cluster_maxclust_dist(*args, **kwargs): # real signature unknown
    """
    Form flat clusters by maxclust criterion.
   
        Parameters
        ----------
        Z : ndarray
            The linkage matrix.
        T : ndarray
            The array to store the cluster numbers. The i'th observation belongs to
            cluster `T[i]`.
        n : int
            The number of observations.
        mc : int
            The maximum number of clusters.
    """
    pass

def cluster_maxclust_monocrit(*args, **kwargs): # real signature unknown
    """
    Form flat clusters by maxclust_monocrit criterion.
   
        Parameters
        ----------
        Z : ndarray
            The linkage matrix.
        MC : ndarray
            The monotonic criterion array.
        T : ndarray
            The array to store the cluster numbers. The i'th observation belongs to
            cluster `T[i]`.
        n : int
            The number of observations.
        max_nc : int
            The maximum number of clusters.
    """
    pass

def cluster_monocrit(*args, **kwargs): # real signature unknown
    """
    Form flat clusters by monocrit criterion.
   
        Parameters
        ----------
        Z : ndarray
            The linkage matrix.
        MC : ndarray
            The monotonic criterion array.
        T : ndarray
            The array to store the cluster numbers. The i'th observation belongs to
            cluster `T[i]`.
        cutoff : double
            Clusters are formed when the MC values are less than or equal to
            `cutoff`.
        n : int
            The number of observations.
    """
    pass

def cophenetic_distances(*args, **kwargs): # real signature unknown
    """
    Calculate the cophenetic distances between each observation
   
        Parameters
        ----------
        Z : ndarray
            The linkage matrix.
        d : ndarray
            The condensed matrix to store the cophenetic distances.
        n : int
            The number of observations.
    """
    pass

def get_max_dist_for_each_cluster(*args, **kwargs): # real signature unknown
    """
    Get the maximum inconsistency coefficient for each non-singleton cluster.
   
        Parameters
        ----------
        Z : ndarray
            The linkage matrix.
        MD : ndarray
            The array to store the result.
        n : int
            The number of observations.
    """
    pass

4 - because in hierarchy.py I have a line like this:
 
      _hierarchy.linkage(dm, Z, n,
                         int(_cpy_non_euclid_methods[method]))
which Z value is different before and after it.


Reply | Threaded
Open this post in threaded view
|

I do not have access to the right _hierarchy.py source file

pegah Aliz
In reply to this post by pegah Aliz
On Sunday, May 17, 2015 at 6:18:51 PM UTC+2, pegah Aliz wrote:

> Hello Everybody,
>
> This question seems simple, but I can't find the solution:
>
> I use scipy.cluster.hierarchy to do a hierarchical clustering on a set of points using "cosine" similarity metric. As an example, I have:
>
>
> import scipy.cluster.hierarchy as hac
> import matplotlib.pyplot as plt
>
> Points =
>   np.array([[ 0.         , 0.23508573],
>  [ 0.00754775 , 0.26717266],
>  [ 0.00595464 , 0.27775905],
>  [ 0.01220563 , 0.23622067],
>  [ 0.00542628 , 0.14185873],
>  [ 0.03078922 , 0.11273108],
>  [ 0.06707743 ,-0.1061131 ],
>  [ 0.04411757 ,-0.10775407],
>  [ 0.01349434 , 0.00112159],
>  [ 0.04066034 , 0.11639591],
>  [ 0.         , 0.29046682],
>  [ 0.07338036 , 0.00609912],
>  [ 0.01864988 , 0.0316196 ],
>  [ 0.         , 0.07270636],
>  [ 0.         ,  0.        ]])
>
>
> z = hac.linkage(Points, metric='cosine', method='complete')
> labels = hac.fcluster(z, 0.1, criterion="distance")
>
>
> plt.scatter(Points[:, 0], Points[:, 1], c=labels.astype(np.float))
> plt.show()
>
>
> Since I use cosine metric, in some cases the dot product of two vectors can be negative or norm of some vectors can be zero. It means z output will have some negative or infinite elements which is not valid for fcluster (as below):
>
> z =
> [[  0.00000000e+00   1.00000000e+01   0.00000000e+00   2.00000000e+00]
>  [  1.30000000e+01   1.50000000e+01   0.00000000e+00   3.00000000e+00]
>  [  8.00000000e+00   1.10000000e+01   4.26658708e-13   2.00000000e+00]
>  [  1.00000000e+00   2.00000000e+00   2.31748880e-05   2.00000000e+00]
>  [  3.00000000e+00   4.00000000e+00   8.96700489e-05   2.00000000e+00]
>  [  1.60000000e+01   1.80000000e+01   3.98805492e-04   5.00000000e+00]
>  [  1.90000000e+01   2.00000000e+01   1.33225099e-03   7.00000000e+00]
>  [  5.00000000e+00   9.00000000e+00   2.41120340e-03   2.00000000e+00]
>  [  6.00000000e+00   7.00000000e+00   1.52914684e-02   2.00000000e+00]
>  [  1.20000000e+01   2.20000000e+01   3.52441432e-02   3.00000000e+00]
>  [  2.10000000e+01   2.40000000e+01   1.38662986e-01   1.00000000e+01]
>  [  1.70000000e+01   2.30000000e+01   6.99056531e-01   4.00000000e+00]
>  [  2.50000000e+01   2.60000000e+01   1.92543748e+00   1.40000000e+01]
>  [ -1.00000000e+00   2.70000000e+01              inf   1.50000000e+01]]
>
> To solve this problem, I checked linkage() function and inside it I needed to check _hierarchy.linkage() method. I use pycharm text editor and when I asked for "linkage" source code, it opened up a python file namely "_hierarchy.py" inside the directory like the following:
>
> .PyCharm40/system/python_stubs/-1247972723/scipy/cluster/_hierarchy.py
>  
> This python file doesn't have any definition for all included functions.  
> I am wondering what is the correct source of this function to revise it and solve my problem.
> I would be appreciated if someone helps me to explore the correct source.
>
> Thanks and Regards
> Pegah

@Peter Thank you. I will do that.