Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible bug in plotSimilarityMatrix #3

Open
fanavarro opened this issue Mar 31, 2022 · 1 comment
Open

Possible bug in plotSimilarityMatrix #3

fanavarro opened this issue Mar 31, 2022 · 1 comment

Comments

@fanavarro
Copy link

fanavarro commented Mar 31, 2022

Hi, I'm testing this tool and I find it very interesting; however, I'm having a little problem (I am not sure if this is a bug or if I am missing something).

I have a similarity matrix that I've calculated by applying the Jaccard similarity to my data. In R this matrix is stored in a data frame, where equal individuals have a similarity of 1, and completely distinct individuals have a similarity of 0. I am using the function plotSimilarityMatrix and It seems to be correct:
imagen

Nonetheless, I tried to recreate the clustering by using hclust. This library needs a dist object, so I did 1 - my similarity matrix so that a similarity of 1 is translated into a distance of 0, and a similarity of 0 is translated into a distance of 1, and I did as.dist(myDistanceMatrix) in order to get a dist object to use with hclust. I used the default parameters for hclust (euclidean distance and complete method), however, the resulting clustering is not as nice as I got before:
imagen

I do not know which cluster is the correct one, but I have checked the code of the function plotSimilarityMatrix and it is using the pheatmap library. If I am not wrong, the similarity matrix received as input by plotSimilarityMatrix is passed to pheatmat. I dived into the pheatmap function and I saw the following code used for calculating the dendrogram:

cluster_mat = function(mat, distance, method){
    if(!(method %in% c("ward.D", "ward.D2", "ward", "single", "complete", "average", "mcquitty", "median", "centroid"))){
        stop("clustering method has to one form the list: 'ward', 'ward.D', 'ward.D2', 'single', 'complete', 'average', 'mcquitty', 'median' or 'centroid'.")
    }
    if(!(distance[1] %in% c("correlation", "euclidean", "maximum", "manhattan", "canberra", "binary", "minkowski")) & class(distance) != "dist"){
        stop("distance has to be a dissimilarity structure as produced by dist or one measure  form the list: 'correlation', 'euclidean', 'maximum', 'manhattan', 'canberra', 'binary', 'minkowski'")
    }
    if(distance[1] == "correlation"){
        d = as.dist(1 - cor(t(mat)))
    }
    else{
        if(class(distance) == "dist"){
            d = distance
        }
        else{
            d = dist(mat, method = distance)
        }
    }
    
    return(hclust(d, method = method))
}

This code checks if the type of the input matrix is a dist object. I think, in this case this would never be a dist object because the function plotSimilarityMatrix is expecting a similarity matrix, not a dissimilarity one. Thus, the above function from pheatmat assumes that the input matrix contains data, not distances, and it calculates a distance matrix through d = dist(mat, method = distance) Then, the clustering appearing in the plot from plotSimilarityMatrix is resulting from calculating the distance among the elements from the input similarity matrix.

Am I correct? I wish I've misunderstood something because I really like the first plot provided by your library, much more than the one I obtained after by applying hclust.

Kind regards,
Francisco Abad.

@fanavarro
Copy link
Author

fanavarro commented Apr 1, 2022

I figured out that you can use hclust objects as clc and clr parameters so that the clusters will be printed as specified by an external hclust execution. Thus, I've solved this by performing a clustering with hclust first, and use the result as parameter for the plotSimilarityMatrix function:

finding_hclust = hclust(as.dist(finding_dissimilarity_matrix), method='complete')
plotSimilarityMatrix(finding_similarity_matrix, clr=finding_hclust, clc=finding_hclust, showObsNames=T)

imagen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant