Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Units of the covariance matrix #129

Open
agrayver opened this issue Feb 25, 2015 · 30 comments
Open

Units of the covariance matrix #129

agrayver opened this issue Feb 25, 2015 · 30 comments
Assignees

Comments

@agrayver
Copy link

After convergence, I can request covariance matrix from CMASolutions. I wonder, what do these covariances present? Are they geno/pheno type? What do I need to do in order to get the covariance matrix in original parameter units (more precisely, if my parameters have units m, then covariance should be m^2)?

@beniz beniz self-assigned this Feb 25, 2015
@beniz
Copy link
Collaborator

beniz commented Feb 25, 2015

Good point. The covariance matrix is definitely in genotype. The lib has no facility yet to bring it back into phenotype. Note that the covariance is also scaled by the step-size sigma. At this point, I wonder if it wouldn't be easier to simply recompute the covariance in phenotype from the solution candidates...

I will definitely look into this.

@nikohansen
Copy link
Collaborator

I would consider looking at the correlation matrix rather than the covariance matrix. This also has the advantage to be unit-less, by rescaling the given coordinates to unit one. In general, I find any such data comparatively hard to interpret, compared, for example, to the standard deviations in the principal axes.

If you are only interested in the standard deviations/variances in the given coordinates, they should be comparatively easy to get and interpret (they can be derived from the standard plots).

Re: genotype/phenotype. Linear transformations can be easily applied to the covariance matrix. Furthermore, the correlation matrix is invariant under parameter rescaling. I would expect that recomputing the covariance matrix from the distribution of the given (phenotype) candidate solutions will work poorly, because a step-size changes can easily prevent to see something stable or reasonable. However, one could just sample around 10 dimension new solutions with the final distribution and do the computations from this sample.

@beniz
Copy link
Collaborator

beniz commented Feb 25, 2015

Linear transformations can be easily applied to the covariance matrix

This applies to the linear scaling, but I was worried about the bound transforms that are not linear close to the boundary IIRC.

However, one could just sample around 10 new solutions with the final distribution and do the computations from this sample.

I believe I need to bring something like this to the lib if only for ROOT users because otherwise the cov is in geno space.

@beniz
Copy link
Collaborator

beniz commented Feb 25, 2015

If you are only interested in the standard deviations/variances in the given coordinates, they should be comparatively easy to get and interpret (they can be derived from the standard plots).

@agrayver FYI and before I put a getter for the std dev, here is how to get them:

// full matrix CMA-ES etc...
dVec stds = cmaparams.get_gp().pheno((dVec)cmasols.cov().sqrt().diagonal()).transpose();
// sep-CMA-ES
dVec stds = cmaparams.get_gp().pheno((dVec)cmasols.sepcov().cwiseSqrt()).transpose();
// VD-CMA-ES
dVec stds = cmaparams.get_gp().pheno(cmasols.sepcov()).transpose(); // C = D(I+vv')D, and we use D^2 as an approx, here sepcov() yields D

Note that I need to look again at VD-CMA-ES as I don't remember why I did leave this approx.

EDIT: VD-CMA

beniz pushed a commit that referenced this issue Feb 25, 2015
…dard deviations + removed approximation of deviations when using vdcma, ref #129
@beniz
Copy link
Collaborator

beniz commented Feb 25, 2015

New CMASolutions::stds() function to get standard deviations, and I've got rid of the approx when running VD-CMA-ES. I am planning two more functions, one for sampled variance, the other for correlation matrix.

@agrayver
Copy link
Author

Emmanuel & Nikolaus, thanks for the comments. I will use CMASolutions::stds() now. Looking forward to having a method for the correlation matrix, this is useful as well.

beniz pushed a commit that referenced this issue Feb 27, 2015
…ettings + full_cov() function that return full size covariance matrix, even with sep and vd algorithms, ref #129
@beniz
Copy link
Collaborator

beniz commented Feb 27, 2015

The commits above add corr() for full size correlation matrix, and full_cov() for full size covariance matrix in genotype space but useful for VD-CMA as it returns the full cov. Also, I've added a corr(i,j) for accessing correlation without computing full size matrices, which is useful in large-scale settings.

@agrayver
Copy link
Author

Great! It works here, thanks.

@agrayver
Copy link
Author

agrayver commented Mar 6, 2015

I compared st. deviations which library returns via stds(cmaparams) and those obtained by the direct calculation using best candidates for all iterations (~4000). They do not match at all. If this is supposed to be so, then I still missing what do quantities returned by stds(cmaparams) represent? Sorry if I am missing something trivial.

@beniz
Copy link
Collaborator

beniz commented Mar 6, 2015

I'm not sure exactly what you are doing, but in all cases stds() at the moment does not rescale the values with sigma. Originally I did this on purpose in order to match the cov() call which does not scale the covariance matrix either. But now thinking about it twice, I may want to provide an errors() call which would rescale the standard dev with sigma. If you are now using the linear scaling geno/pheno transform, I believe you should be able to check whether this is the issue, by rescaling the results of stds with sqrt(sigma).

@nikohansen
Copy link
Collaborator

Re stds: Without incorporating sigma, only their values relative to each other have a meaning, their absolute values are meaningless.

I compared st. deviations which library returns via stds(cmaparams) and those obtained by the direct calculation using best candidates for all iterations (~4000).

I don't quite see what a variation computed over all iterations would tell us. What do you want to know?

The "true" standard deviations (i.e. including sigma) of the last iteration give us an estimate of the current distance (or error, if you like) to the (local) optimum for which the current distribution mean is the estimator. The error in each coordinate is roughly std times max(1, sqrt(dimension)/mu). There is no guaranty for the estimate to be correct, but it's a somewhat more reliable version of EDM.

@agrayver
Copy link
Author

agrayver commented Mar 8, 2015

My aim is to find means for assessing a model that I am getting after minimum has been reached. For instance, I want to know variance of parameters, their interdependence, plot histograms, etc. Those are standard things people do when they apply inference methods such as MCMC. Since on the way to minimum, CMAES samples parameter space quite few times, this information can somehow be used. Maybe I am wrong and should consider CMAES merely a minimization method?

@nikohansen
Copy link
Collaborator

Since on the way to minimum, CMAES samples parameter space quite few times, this information can somehow be used.

Generally, this depends to a large extend on the initial conditions, i.e. the initial parameters and standard deviations used.

Maybe I am wrong and should consider CMAES merely a minimization method?

Even though CMA-ES is quite similar to MCMC methods, CMA-ES will never sample from a stationary distribution. That makes under some aspects a rather big difference. On a positive note, up to a multiplicative scalar factor, the final covariance matrix is often a stationary one.

@agrayver
Copy link
Author

Generally, this depends to a large extend on the initial conditions, i.e. the initial parameters and standard deviations used.

Same as MCMC methods depend on a prior density function and a whole bunch of other tweaks.

Even though CMA-ES is quite similar to MCMC methods, CMA-ES will never sample from a stationary distribution. That makes under some aspects a rather big difference.

Thanks for pointing that out, I see now. Well, at least upon convergence, the statistics collected from the last n << n_iter steps should be valid at least locally (assuming quadratic approximation of the function)?

@nikohansen
Copy link
Collaborator

To aggregate statistics of sampled points over several iterations remains questionable unless done with great care. This is anyway exactly what CMA-ES does: aggregating statistics over a certain backward time horizon of iterations, but with careful (time-variant) re-normalizations, such that the samples across iterations become comparable (i.e. stationary, if you like).

@agrayver
Copy link
Author

@beniz I have tested errors(...) and the standard deviations it returns depend on a scaling strategy I use, e.g. they are (vastly) different for NoScalingStrategy and linScalingStrategy. Is this supposed to be so?

@nikohansen I see your point. Currently I try to make any sense out of values returned by errors() method that Emmanuel introduced recently. From the physics of my system I know that some parameters are much worse determined than others, i.e. physical system under consideration has different sensitivity for different parameters and I believe this has to be reflected in the covariance matrix somehow (at least in MCMC and deterministic methods this is the case). Howeever, all the time errors() gives me values like that:
1.00015
1.00016
1.00016
1.00016
1.00014
1.00014
1.00013
1.00012
1.00011
1.00011
1.0001
1.0001
1.0001
1.00009
1.00009
1.0001
1.00008
1.00008
1.00008

These values are not particularly insightful. Where am I wrong?

@beniz
Copy link
Collaborator

beniz commented Mar 12, 2015

I have tested errors(...) and the standard deviations it returns depend on a scaling strategy I use, e.g. they are (vastly) different for NoScalingStrategy and linScalingStrategy. Is this supposed to be so?

I guess they shouldn't be. Basically errors() is stds() rescaled by sigma over which the phenotype is applied...

@agrayver
Copy link
Author

stds() may also be affected by this, I have not tested. Here is what I get with NoScalingStrategy:
1.08905
1.08523
1.08039
...
and linScalingStrategy:
0.000113827
0.000113704
0.000113688

@beniz
Copy link
Collaborator

beniz commented Mar 12, 2015

I can't test right now, but can you report what you are getting on sphere for instance ?

@nikohansen
Copy link
Collaborator

@agrayver these are the errors you could get after a very small number of iterations with initial sigma very close to one. Otherwise you should get something different (more meaningful) with probability close to one, see e.g. #131

std deviations: [  4.24406277e-08   3.07314735e-08   2.21571269e-08   1.39661633e-08  1.23903235e-08   7.99126636e-09   6.43127205e-09   4.14453821e-09 ...]

@agrayver
Copy link
Author

@nikohansen those are the values I am getting at the final (3000) iterations when reaching minimum. I guess one has to check these with/without bounds or/and scaling on some test functions.

@beniz
Copy link
Collaborator

beniz commented Mar 13, 2015

OK, I believe I got it. stds and errors are relative to the mean, so that more care should be taken when applying the pheno transform. The solution is to bring the mean and the mean plus error to the phenotype and get the difference there. I believe this should be generic enough to accomodate for non-linear transforms as well.

I will commit a fix in coming days, but if you want to compute the errors properly with linear scaling right now, the code below should work fine:

dVec phen_xmean = cmaparams.get_gp().pheno(cmasols.xmean());
dVec stds = cmasols.cov().diagonal().cwiseSqrt();
dVec phen_xmean_std = cmaparams.get_gp().pheno(static_cast<dVec>(cmasols.xmean() + stds));
dVec est_errors = std::sqrt(cmasols.sigma())*(phen_xmean_std - phen_xmean);

where cmaparams is your CMAParameters object and cmasols your CMASolutions object. The code above yields the correct results for me with and without linear scaling.

@agrayver
Copy link
Author

@beniz maybe you need to add fabs, because ocassionally I am getting negative values for some parameters.

@agrayver
Copy link
Author

It is also not clear to me how statistics are affected by non-linear transformation used inside the library. I mean if you calculate statistics in the transformed space, how do you convert them into parameter space? Simply calling back transformation will not (generally) give the correct answer.

@beniz
Copy link
Collaborator

beniz commented Mar 16, 2015

FYI, a final fix is now available for stds and errors with scaling and geno/pheno linear/non-linear transforms.

@agrayver
Copy link
Author

I have now resumed experiments. Still trying to understand the nature of covariance matrix library returns. I have constructed a convex noise-free problem and found a minimum using quasi-Newton.
Here is how the inverse of the Hessian (which, up to a constant scaling factor, should be a covariance at the minimum) looks like:
newton

The minimum point found by the aCMAES is very similar to that from the quasi-Newton, but the covariance matrix looks different:
cmaes

While some similarities are present, the matrices are far from being different. I wonder where does this difference come from?

@nikohansen
Copy link
Collaborator

You might want to run the experiment with CMA-ES again with a different random seed and/or initial point, and compare the result to the above. This will show the amount of stochastic deviation involved, which I reckon is the reason for the difference.

EDIT: investigating the eigenvalues of the covariance matrix over time, as shown in the default plots, reveals whether a stable configuration has been reached. It also reveals, at least in part, the stochastic deviations/fluctuations we expect to see in the end by chance.

@agrayver
Copy link
Author

Here are three more runs with random seeds:
cmaes1
cmaes2
cmaes3

Admittedly, they are all similar. On the other hand, there is a unique hessian at the minimum from which I can get a locally valid covariance matrix.
The ultimate question, if it makes sense at all, is which one is correct or better?

@loshchil
Copy link

if it makes sense at all, is which one is correct or better?

the average/median of normalized results from N>>1 runs?

On Fri, Apr 10, 2015 at 3:42 PM, Alexander Grayver <notifications@github.com

wrote:

Here are three more runs:
[image: cmaes1]
https://cloud.githubusercontent.com/assets/8438035/7088868/b15b2c46-df97-11e4-9b45-ffab840584d4.png
[image: cmaes2]
https://cloud.githubusercontent.com/assets/8438035/7088869/b15d5354-df97-11e4-88c6-829f42827fd6.png
[image: cmaes3]
https://cloud.githubusercontent.com/assets/8438035/7088870/b15d8bee-df97-11e4-9275-e545eaef58cb.png

Admittedly, they are all similar. On the other hand, there is a unique
hessian at the minimum from which I can get a locally valid covariance
matrix.
The ultimate question, if it makes sense at all, is which one is correct
or better?


Reply to this email directly or view it on GitHub
#129 (comment).

@nikohansen
Copy link
Collaborator

As we know the true Hessian matrix, we know which matrix is correct and, depending on a distance measure of your choice, you can compute which one is better, i.e. closer to the true value.

andrewsali pushed a commit to andrewsali/libcmaes that referenced this issue Jan 31, 2016
andrewsali pushed a commit to andrewsali/libcmaes that referenced this issue Jan 31, 2016
…dard deviations + removed approximation of deviations when using vdcma, ref CMA-ES#129
andrewsali pushed a commit to andrewsali/libcmaes that referenced this issue Jan 31, 2016
andrewsali pushed a commit to andrewsali/libcmaes that referenced this issue Jan 31, 2016
…ettings + full_cov() function that return full size covariance matrix, even with sep and vd algorithms, ref CMA-ES#129
andrewsali pushed a commit to andrewsali/libcmaes that referenced this issue Jan 31, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants