Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prcomp_irlba #23

Closed
erichson opened this issue Sep 7, 2017 · 5 comments
Closed

prcomp_irlba #23

erichson opened this issue Sep 7, 2017 · 5 comments

Comments

@erichson
Copy link

erichson commented Sep 7, 2017

You give the following example for the prcomp_irlba, however, the results are not consistent:

> set.seed(1)
> x  <- matrix(rnorm(200), nrow=20)
> p1 <- prcomp_irlba(x, n=3)
> summary(p1)
Importance of components%s:
                         PC1    PC2    PC3
Standard deviation     1.541 1.2513 1.1916
Proportion of Variance 0.443 0.2921 0.2649
Cumulative Proportion  0.443 0.7351 1.0000
> p2 <- prcomp(x, tol=0.7)
> summary(p2)
Importance of first k=3 (out of 10) components:
                          PC1    PC2    PC3
Standard deviation     1.5411 1.2513 1.1916
Proportion of Variance 0.2806 0.1850 0.1678
Cumulative Proportion  0.2806 0.4656 0.6334
@bwlewis
Copy link
Owner

bwlewis commented Oct 2, 2017

Thanks for noting this! I will add a comment in the documentation for prcomp_irlba explitly about this.

It's not an error -- the differences are due to the fact that prcomp_irlba computes something different than prcomp -- it computes a truncated principal components decomposition. Thus the cumulative summaries are not known and cannot be computed.

The current documentation has some notes on the differences, but doesn't mention your example so I will add that. Thanks for pointing it out, the added documentation will help others.

Best,

Bryan

bwlewis added a commit that referenced this issue Oct 2, 2017
@bwlewis
Copy link
Owner

bwlewis commented Oct 2, 2017

OK, see ?prcomp_irlba and example(prcomp_irlba). Let me know if you think that the exampe can be better. Feel free to send a PR!

@erichson
Copy link
Author

erichson commented Oct 2, 2017

Bryan, that is certainly a fix, but I would suggest the approach I take for computing the randomized principal component analysis: https://github.com/Benli11/rSVD/blob/master/R/rpca.R, line 203.

a) you compute the total variation of the input data as sigma = sum( apply( Re(A) , 2, stats::var ), where A denotes the input matrix.

b) Then you can compute the explained variance ratio as sdev**2 / sigma, where sdev denotes the square root of the approximated dominant k eigenvalues.

Best,
Ben

@bwlewis
Copy link
Owner

bwlewis commented Oct 5, 2017

Hi Ben,

Thanks! That didn't occur to me.

I didn't realize someone else had implemented the Martinsson, etc. method in R. I guess we have two implementations for now. I added references to your package in the help and vingette.

I'm wondering if I should just remove that algorithm from the irlba package in the medium term, and simply point to your package in the future? Would you consider adding a convergence implementation similar to the one in rsvd? Perhaps I can send a pull request sometime... What would you prefer?

I have found that block methods are superior for pathological problems with clustered singular values, see the examples in irlba. That's why I ended up putting the Martinsson method in there in the first place. Ultimately I'd like to have a solid block irlba implementation in there, but I've had trouble getting something that works well in general (the nice original paper by Baglama and Reichel, http://www.math.uri.edu/~jbaglama/papers/paper17.pdf, has some implementation issues that I have never resolved that result in poor performance sometimes.)

Again, thanks for helping to improve irlba with your suggestion. Further collaboration would be great! *

Best,

Bryan

  • @benli11 perhaps you might be interested in co-maintaining irlba? just a thought...

@erichson
Copy link
Author

erichson commented Oct 6, 2017

Bryan, thanks for adding the references. That is really great! Whether you depreciate the rsvd implementation is up to you, of course. Anyway, I would be happy to add variants of the rsvd if you send a pull request. I think that can make the rsvd package stronger.

I would be very happy in helping you co-maintaining the irlba package, if I can make helpful contributions. I send you you an email regarding ideas for a future collaboration. Also, I am looking forward to discuss some of the pathological problems, and see if we can come up with some better implementations.

Best,
Ben

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants