Units of the covariance matrix #129

agrayver · 2015-02-25T13:24:06Z

After convergence, I can request covariance matrix from CMASolutions. I wonder, what do these covariances present? Are they geno/pheno type? What do I need to do in order to get the covariance matrix in original parameter units (more precisely, if my parameters have units m, then covariance should be m^2)?

beniz · 2015-02-25T13:42:43Z

Good point. The covariance matrix is definitely in genotype. The lib has no facility yet to bring it back into phenotype. Note that the covariance is also scaled by the step-size sigma. At this point, I wonder if it wouldn't be easier to simply recompute the covariance in phenotype from the solution candidates...

I will definitely look into this.

nikohansen · 2015-02-25T14:02:33Z

I would consider looking at the correlation matrix rather than the covariance matrix. This also has the advantage to be unit-less, by rescaling the given coordinates to unit one. In general, I find any such data comparatively hard to interpret, compared, for example, to the standard deviations in the principal axes.

If you are only interested in the standard deviations/variances in the given coordinates, they should be comparatively easy to get and interpret (they can be derived from the standard plots).

Re: genotype/phenotype. Linear transformations can be easily applied to the covariance matrix. Furthermore, the correlation matrix is invariant under parameter rescaling. I would expect that recomputing the covariance matrix from the distribution of the given (phenotype) candidate solutions will work poorly, because a step-size changes can easily prevent to see something stable or reasonable. However, one could just sample around 10 dimension new solutions with the final distribution and do the computations from this sample.

beniz · 2015-02-25T14:26:03Z

Linear transformations can be easily applied to the covariance matrix

This applies to the linear scaling, but I was worried about the bound transforms that are not linear close to the boundary IIRC.

However, one could just sample around 10 new solutions with the final distribution and do the computations from this sample.

I believe I need to bring something like this to the lib if only for ROOT users because otherwise the cov is in geno space.

beniz · 2015-02-25T14:46:21Z

If you are only interested in the standard deviations/variances in the given coordinates, they should be comparatively easy to get and interpret (they can be derived from the standard plots).

@agrayver FYI and before I put a getter for the std dev, here is how to get them:

// full matrix CMA-ES etc...
dVec stds = cmaparams.get_gp().pheno((dVec)cmasols.cov().sqrt().diagonal()).transpose();
// sep-CMA-ES
dVec stds = cmaparams.get_gp().pheno((dVec)cmasols.sepcov().cwiseSqrt()).transpose();
// VD-CMA-ES
dVec stds = cmaparams.get_gp().pheno(cmasols.sepcov()).transpose(); // C = D(I+vv')D, and we use D^2 as an approx, here sepcov() yields D

Note that I need to look again at VD-CMA-ES as I don't remember why I did leave this approx.

EDIT: VD-CMA

…dard deviations + removed approximation of deviations when using vdcma, ref #129

beniz · 2015-02-25T17:34:58Z

New CMASolutions::stds() function to get standard deviations, and I've got rid of the approx when running VD-CMA-ES. I am planning two more functions, one for sampled variance, the other for correlation matrix.

agrayver · 2015-02-25T19:38:03Z

Emmanuel & Nikolaus, thanks for the comments. I will use CMASolutions::stds() now. Looking forward to having a method for the correlation matrix, this is useful as well.

…ettings + full_cov() function that return full size covariance matrix, even with sep and vd algorithms, ref #129

beniz · 2015-02-27T13:05:27Z

The commits above add corr() for full size correlation matrix, and full_cov() for full size covariance matrix in genotype space but useful for VD-CMA as it returns the full cov. Also, I've added a corr(i,j) for accessing correlation without computing full size matrices, which is useful in large-scale settings.

agrayver · 2015-02-27T20:15:35Z

Great! It works here, thanks.

agrayver · 2015-03-06T14:34:14Z

I compared st. deviations which library returns via stds(cmaparams) and those obtained by the direct calculation using best candidates for all iterations (~4000). They do not match at all. If this is supposed to be so, then I still missing what do quantities returned by stds(cmaparams) represent? Sorry if I am missing something trivial.

beniz · 2015-03-06T16:47:26Z

I'm not sure exactly what you are doing, but in all cases stds() at the moment does not rescale the values with sigma. Originally I did this on purpose in order to match the cov() call which does not scale the covariance matrix either. But now thinking about it twice, I may want to provide an errors() call which would rescale the standard dev with sigma. If you are now using the linear scaling geno/pheno transform, I believe you should be able to check whether this is the issue, by rescaling the results of stds with sqrt(sigma).

nikohansen · 2015-03-07T13:46:13Z

Re stds: Without incorporating sigma, only their values relative to each other have a meaning, their absolute values are meaningless.

I compared st. deviations which library returns via stds(cmaparams) and those obtained by the direct calculation using best candidates for all iterations (~4000).

I don't quite see what a variation computed over all iterations would tell us. What do you want to know?

The "true" standard deviations (i.e. including sigma) of the last iteration give us an estimate of the current distance (or error, if you like) to the (local) optimum for which the current distribution mean is the estimator. The error in each coordinate is roughly std times max(1, sqrt(dimension)/mu). There is no guaranty for the estimate to be correct, but it's a somewhat more reliable version of EDM.

agrayver · 2015-03-08T14:10:54Z

My aim is to find means for assessing a model that I am getting after minimum has been reached. For instance, I want to know variance of parameters, their interdependence, plot histograms, etc. Those are standard things people do when they apply inference methods such as MCMC. Since on the way to minimum, CMAES samples parameter space quite few times, this information can somehow be used. Maybe I am wrong and should consider CMAES merely a minimization method?

nikohansen · 2015-03-08T17:41:00Z

Since on the way to minimum, CMAES samples parameter space quite few times, this information can somehow be used.

Generally, this depends to a large extend on the initial conditions, i.e. the initial parameters and standard deviations used.

Maybe I am wrong and should consider CMAES merely a minimization method?

Even though CMA-ES is quite similar to MCMC methods, CMA-ES will never sample from a stationary distribution. That makes under some aspects a rather big difference. On a positive note, up to a multiplicative scalar factor, the final covariance matrix is often a stationary one.

…#129

agrayver · 2015-03-11T15:04:14Z

Generally, this depends to a large extend on the initial conditions, i.e. the initial parameters and standard deviations used.

Same as MCMC methods depend on a prior density function and a whole bunch of other tweaks.

Even though CMA-ES is quite similar to MCMC methods, CMA-ES will never sample from a stationary distribution. That makes under some aspects a rather big difference.

Thanks for pointing that out, I see now. Well, at least upon convergence, the statistics collected from the last n << n_iter steps should be valid at least locally (assuming quadratic approximation of the function)?

nikohansen · 2015-03-11T15:16:39Z

To aggregate statistics of sampled points over several iterations remains questionable unless done with great care. This is anyway exactly what CMA-ES does: aggregating statistics over a certain backward time horizon of iterations, but with careful (time-variant) re-normalizations, such that the samples across iterations become comparable (i.e. stationary, if you like).

agrayver · 2015-03-12T14:22:00Z

@beniz I have tested errors(...) and the standard deviations it returns depend on a scaling strategy I use, e.g. they are (vastly) different for NoScalingStrategy and linScalingStrategy. Is this supposed to be so?

@nikohansen I see your point. Currently I try to make any sense out of values returned by errors() method that Emmanuel introduced recently. From the physics of my system I know that some parameters are much worse determined than others, i.e. physical system under consideration has different sensitivity for different parameters and I believe this has to be reflected in the covariance matrix somehow (at least in MCMC and deterministic methods this is the case). Howeever, all the time errors() gives me values like that:
1.00015
1.00016
1.00016
1.00016
1.00014
1.00014
1.00013
1.00012
1.00011
1.00011
1.0001
1.0001
1.0001
1.00009
1.00009
1.0001
1.00008
1.00008
1.00008

These values are not particularly insightful. Where am I wrong?

beniz · 2015-03-12T14:31:38Z

I have tested errors(...) and the standard deviations it returns depend on a scaling strategy I use, e.g. they are (vastly) different for NoScalingStrategy and linScalingStrategy. Is this supposed to be so?

I guess they shouldn't be. Basically errors() is stds() rescaled by sigma over which the phenotype is applied...

agrayver · 2015-03-12T14:40:25Z

stds() may also be affected by this, I have not tested. Here is what I get with NoScalingStrategy:
1.08905
1.08523
1.08039
...
and linScalingStrategy:
0.000113827
0.000113704
0.000113688

beniz · 2015-03-12T14:41:37Z

I can't test right now, but can you report what you are getting on sphere for instance ?

nikohansen · 2015-03-12T15:52:31Z

@agrayver these are the errors you could get after a very small number of iterations with initial sigma very close to one. Otherwise you should get something different (more meaningful) with probability close to one, see e.g. #131

std deviations: [  4.24406277e-08   3.07314735e-08   2.21571269e-08   1.39661633e-08  1.23903235e-08   7.99126636e-09   6.43127205e-09   4.14453821e-09 ...]

agrayver · 2015-03-12T17:17:18Z

@nikohansen those are the values I am getting at the final (3000) iterations when reaching minimum. I guess one has to check these with/without bounds or/and scaling on some test functions.

beniz · 2015-03-13T07:52:48Z

OK, I believe I got it. stds and errors are relative to the mean, so that more care should be taken when applying the pheno transform. The solution is to bring the mean and the mean plus error to the phenotype and get the difference there. I believe this should be generic enough to accomodate for non-linear transforms as well.

I will commit a fix in coming days, but if you want to compute the errors properly with linear scaling right now, the code below should work fine:

dVec phen_xmean = cmaparams.get_gp().pheno(cmasols.xmean());
dVec stds = cmasols.cov().diagonal().cwiseSqrt();
dVec phen_xmean_std = cmaparams.get_gp().pheno(static_cast<dVec>(cmasols.xmean() + stds));
dVec est_errors = std::sqrt(cmasols.sigma())*(phen_xmean_std - phen_xmean);

where cmaparams is your CMAParameters object and cmasols your CMASolutions object. The code above yields the correct results for me with and without linear scaling.

agrayver · 2015-03-13T08:55:56Z

@beniz maybe you need to add fabs, because ocassionally I am getting negative values for some parameters.

agrayver · 2015-03-13T11:45:27Z

It is also not clear to me how statistics are affected by non-linear transformation used inside the library. I mean if you calculate statistics in the transformed space, how do you convert them into parameter space? Simply calling back transformation will not (generally) give the correct answer.

… ref #129

beniz · 2015-03-16T07:17:27Z

FYI, a final fix is now available for stds and errors with scaling and geno/pheno linear/non-linear transforms.

agrayver · 2015-04-10T13:15:35Z

I have now resumed experiments. Still trying to understand the nature of covariance matrix library returns. I have constructed a convex noise-free problem and found a minimum using quasi-Newton.
Here is how the inverse of the Hessian (which, up to a constant scaling factor, should be a covariance at the minimum) looks like:

The minimum point found by the aCMAES is very similar to that from the quasi-Newton, but the covariance matrix looks different:

While some similarities are present, the matrices are far from being different. I wonder where does this difference come from?

nikohansen · 2015-04-10T13:28:47Z

You might want to run the experiment with CMA-ES again with a different random seed and/or initial point, and compare the result to the above. This will show the amount of stochastic deviation involved, which I reckon is the reason for the difference.

EDIT: investigating the eigenvalues of the covariance matrix over time, as shown in the default plots, reveals whether a stable configuration has been reached. It also reveals, at least in part, the stochastic deviations/fluctuations we expect to see in the end by chance.

agrayver · 2015-04-10T13:42:23Z

Here are three more runs with random seeds:

Admittedly, they are all similar. On the other hand, there is a unique hessian at the minimum from which I can get a locally valid covariance matrix.
The ultimate question, if it makes sense at all, is which one is correct or better?

loshchil · 2015-04-10T13:47:45Z

if it makes sense at all, is which one is correct or better?

the average/median of normalized results from N>>1 runs?

On Fri, Apr 10, 2015 at 3:42 PM, Alexander Grayver <notifications@github.com

wrote:

Here are three more runs:
[image: cmaes1]
https://cloud.githubusercontent.com/assets/8438035/7088868/b15b2c46-df97-11e4-9b45-ffab840584d4.png
[image: cmaes2]
https://cloud.githubusercontent.com/assets/8438035/7088869/b15d5354-df97-11e4-88c6-829f42827fd6.png
[image: cmaes3]
https://cloud.githubusercontent.com/assets/8438035/7088870/b15d8bee-df97-11e4-9275-e545eaef58cb.png

Admittedly, they are all similar. On the other hand, there is a unique
hessian at the minimum from which I can get a locally valid covariance
matrix.
The ultimate question, if it makes sense at all, is which one is correct
or better?

—
Reply to this email directly or view it on GitHub
#129 (comment).

nikohansen · 2015-04-10T14:00:39Z

As we know the true Hessian matrix, we know which matrix is correct and, depending on a distance measure of your choice, you can compute which one is better, i.e. closer to the true value.

…dard deviations + removed approximation of deviations when using vdcma, ref CMA-ES#129

…S#129

…ettings + full_cov() function that return full size covariance matrix, even with sep and vd algorithms, ref CMA-ES#129

…CMA-ES#129

… ref CMA-ES#129

beniz self-assigned this Feb 25, 2015

beniz pushed a commit that referenced this issue Feb 25, 2015

fixed pheno transform of standard deviation in plots, ref #129

c7e10cd

beniz pushed a commit that referenced this issue Feb 25, 2015

new CMASolutions::stds() function that returns current vector of stan…

44e61b5

…dard deviations + removed approximation of deviations when using vdcma, ref #129

beniz pushed a commit that referenced this issue Feb 27, 2015

added correlation matrix computation, including for vd-cma, ref #129

1a1d03c

beniz pushed a commit that referenced this issue Feb 27, 2015

bug fix in stds + corr(i,j) function for correlation in large-scale s…

a83199d

…ettings + full_cov() function that return full size covariance matrix, even with sep and vd algorithms, ref #129

beniz pushed a commit that referenced this issue Mar 9, 2015

added an errors() call with returns the scaled standard deviation, ref …

2e049b2

…#129

beniz pushed a commit that referenced this issue Mar 16, 2015

fixed standard deviation and error estimates when scaling is enabled,…

a90e91c

… ref #129

andrewsali pushed a commit to andrewsali/libcmaes that referenced this issue Jan 31, 2016

fixed pheno transform of standard deviation in plots, ref CMA-ES#129

0cc94fc

andrewsali pushed a commit to andrewsali/libcmaes that referenced this issue Jan 31, 2016

new CMASolutions::stds() function that returns current vector of stan…

0e8cdbc

…dard deviations + removed approximation of deviations when using vdcma, ref CMA-ES#129

andrewsali pushed a commit to andrewsali/libcmaes that referenced this issue Jan 31, 2016

added correlation matrix computation, including for vd-cma, ref CMA-E…

4093add

…S#129

andrewsali pushed a commit to andrewsali/libcmaes that referenced this issue Jan 31, 2016

bug fix in stds + corr(i,j) function for correlation in large-scale s…

2f35800

…ettings + full_cov() function that return full size covariance matrix, even with sep and vd algorithms, ref CMA-ES#129

andrewsali pushed a commit to andrewsali/libcmaes that referenced this issue Jan 31, 2016

added an errors() call with returns the scaled standard deviation, ref …

9eb4b80

…CMA-ES#129

andrewsali pushed a commit to andrewsali/libcmaes that referenced this issue Jan 31, 2016

fixed standard deviation and error estimates when scaling is enabled,…

acf7ee0

… ref CMA-ES#129

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Units of the covariance matrix #129

Units of the covariance matrix #129

agrayver commented Feb 25, 2015

beniz commented Feb 25, 2015

nikohansen commented Feb 25, 2015

beniz commented Feb 25, 2015

beniz commented Feb 25, 2015

beniz commented Feb 25, 2015

agrayver commented Feb 25, 2015

beniz commented Feb 27, 2015

agrayver commented Feb 27, 2015

agrayver commented Mar 6, 2015

beniz commented Mar 6, 2015

nikohansen commented Mar 7, 2015

agrayver commented Mar 8, 2015

nikohansen commented Mar 8, 2015

agrayver commented Mar 11, 2015

nikohansen commented Mar 11, 2015

agrayver commented Mar 12, 2015

beniz commented Mar 12, 2015

agrayver commented Mar 12, 2015

beniz commented Mar 12, 2015

nikohansen commented Mar 12, 2015

agrayver commented Mar 12, 2015

beniz commented Mar 13, 2015

agrayver commented Mar 13, 2015

agrayver commented Mar 13, 2015

beniz commented Mar 16, 2015

agrayver commented Apr 10, 2015

nikohansen commented Apr 10, 2015

agrayver commented Apr 10, 2015

loshchil commented Apr 10, 2015

nikohansen commented Apr 10, 2015

Units of the covariance matrix #129

Units of the covariance matrix #129

Comments

agrayver commented Feb 25, 2015

beniz commented Feb 25, 2015

nikohansen commented Feb 25, 2015

beniz commented Feb 25, 2015

beniz commented Feb 25, 2015

beniz commented Feb 25, 2015

agrayver commented Feb 25, 2015

beniz commented Feb 27, 2015

agrayver commented Feb 27, 2015

agrayver commented Mar 6, 2015

beniz commented Mar 6, 2015

nikohansen commented Mar 7, 2015

agrayver commented Mar 8, 2015

nikohansen commented Mar 8, 2015

agrayver commented Mar 11, 2015

nikohansen commented Mar 11, 2015

agrayver commented Mar 12, 2015

beniz commented Mar 12, 2015

agrayver commented Mar 12, 2015

beniz commented Mar 12, 2015

nikohansen commented Mar 12, 2015

agrayver commented Mar 12, 2015

beniz commented Mar 13, 2015

agrayver commented Mar 13, 2015

agrayver commented Mar 13, 2015

beniz commented Mar 16, 2015

agrayver commented Apr 10, 2015

nikohansen commented Apr 10, 2015

agrayver commented Apr 10, 2015

loshchil commented Apr 10, 2015

nikohansen commented Apr 10, 2015