-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Units of the covariance matrix #129
Comments
Good point. The covariance matrix is definitely in genotype. The lib has no facility yet to bring it back into phenotype. Note that the covariance is also scaled by the step-size I will definitely look into this. |
I would consider looking at the correlation matrix rather than the covariance matrix. This also has the advantage to be unit-less, by rescaling the given coordinates to unit one. In general, I find any such data comparatively hard to interpret, compared, for example, to the standard deviations in the principal axes. If you are only interested in the standard deviations/variances in the given coordinates, they should be comparatively easy to get and interpret (they can be derived from the standard plots). Re: genotype/phenotype. Linear transformations can be easily applied to the covariance matrix. Furthermore, the correlation matrix is invariant under parameter rescaling. I would expect that recomputing the covariance matrix from the distribution of the given (phenotype) candidate solutions will work poorly, because a step-size changes can easily prevent to see something stable or reasonable. However, one could just sample around |
This applies to the linear scaling, but I was worried about the bound transforms that are not linear close to the boundary IIRC.
I believe I need to bring something like this to the lib if only for ROOT users because otherwise the cov is in geno space. |
@agrayver FYI and before I put a getter for the std dev, here is how to get them: // full matrix CMA-ES etc...
dVec stds = cmaparams.get_gp().pheno((dVec)cmasols.cov().sqrt().diagonal()).transpose();
// sep-CMA-ES
dVec stds = cmaparams.get_gp().pheno((dVec)cmasols.sepcov().cwiseSqrt()).transpose();
// VD-CMA-ES
dVec stds = cmaparams.get_gp().pheno(cmasols.sepcov()).transpose(); // C = D(I+vv')D, and we use D^2 as an approx, here sepcov() yields D Note that I need to look again at VD-CMA-ES as I don't remember why I did leave this approx. EDIT: VD-CMA |
…dard deviations + removed approximation of deviations when using vdcma, ref #129
New CMASolutions::stds() function to get standard deviations, and I've got rid of the approx when running VD-CMA-ES. I am planning two more functions, one for sampled variance, the other for correlation matrix. |
Emmanuel & Nikolaus, thanks for the comments. I will use CMASolutions::stds() now. Looking forward to having a method for the correlation matrix, this is useful as well. |
…ettings + full_cov() function that return full size covariance matrix, even with sep and vd algorithms, ref #129
The commits above add |
Great! It works here, thanks. |
I compared st. deviations which library returns via stds(cmaparams) and those obtained by the direct calculation using best candidates for all iterations (~4000). They do not match at all. If this is supposed to be so, then I still missing what do quantities returned by stds(cmaparams) represent? Sorry if I am missing something trivial. |
I'm not sure exactly what you are doing, but in all cases |
Re stds: Without incorporating
I don't quite see what a variation computed over all iterations would tell us. What do you want to know? The "true" standard deviations (i.e. including |
My aim is to find means for assessing a model that I am getting after minimum has been reached. For instance, I want to know variance of parameters, their interdependence, plot histograms, etc. Those are standard things people do when they apply inference methods such as MCMC. Since on the way to minimum, CMAES samples parameter space quite few times, this information can somehow be used. Maybe I am wrong and should consider CMAES merely a minimization method? |
Generally, this depends to a large extend on the initial conditions, i.e. the initial parameters and standard deviations used.
Even though CMA-ES is quite similar to MCMC methods, CMA-ES will never sample from a stationary distribution. That makes under some aspects a rather big difference. On a positive note, up to a multiplicative scalar factor, the final covariance matrix is often a stationary one. |
Same as MCMC methods depend on a prior density function and a whole bunch of other tweaks.
Thanks for pointing that out, I see now. Well, at least upon convergence, the statistics collected from the last n << n_iter steps should be valid at least locally (assuming quadratic approximation of the function)? |
To aggregate statistics of sampled points over several iterations remains questionable unless done with great care. This is anyway exactly what CMA-ES does: aggregating statistics over a certain backward time horizon of iterations, but with careful (time-variant) re-normalizations, such that the samples across iterations become comparable (i.e. stationary, if you like). |
@beniz I have tested errors(...) and the standard deviations it returns depend on a scaling strategy I use, e.g. they are (vastly) different for NoScalingStrategy and linScalingStrategy. Is this supposed to be so? @nikohansen I see your point. Currently I try to make any sense out of values returned by errors() method that Emmanuel introduced recently. From the physics of my system I know that some parameters are much worse determined than others, i.e. physical system under consideration has different sensitivity for different parameters and I believe this has to be reflected in the covariance matrix somehow (at least in MCMC and deterministic methods this is the case). Howeever, all the time errors() gives me values like that: These values are not particularly insightful. Where am I wrong? |
I guess they shouldn't be. Basically errors() is stds() rescaled by sigma over which the phenotype is applied... |
stds() may also be affected by this, I have not tested. Here is what I get with NoScalingStrategy: |
I can't test right now, but can you report what you are getting on sphere for instance ? |
@agrayver these are the errors you could get after a very small number of iterations with initial
|
@nikohansen those are the values I am getting at the final (3000) iterations when reaching minimum. I guess one has to check these with/without bounds or/and scaling on some test functions. |
OK, I believe I got it. stds and errors are relative to the mean, so that more care should be taken when applying the pheno transform. The solution is to bring the mean and the mean plus error to the phenotype and get the difference there. I believe this should be generic enough to accomodate for non-linear transforms as well. I will commit a fix in coming days, but if you want to compute the errors properly with linear scaling right now, the code below should work fine: dVec phen_xmean = cmaparams.get_gp().pheno(cmasols.xmean());
dVec stds = cmasols.cov().diagonal().cwiseSqrt();
dVec phen_xmean_std = cmaparams.get_gp().pheno(static_cast<dVec>(cmasols.xmean() + stds));
dVec est_errors = std::sqrt(cmasols.sigma())*(phen_xmean_std - phen_xmean); where |
@beniz maybe you need to add fabs, because ocassionally I am getting negative values for some parameters. |
It is also not clear to me how statistics are affected by non-linear transformation used inside the library. I mean if you calculate statistics in the transformed space, how do you convert them into parameter space? Simply calling back transformation will not (generally) give the correct answer. |
FYI, a final fix is now available for |
You might want to run the experiment with CMA-ES again with a different random seed and/or initial point, and compare the result to the above. This will show the amount of stochastic deviation involved, which I reckon is the reason for the difference. EDIT: investigating the eigenvalues of the covariance matrix over time, as shown in the default plots, reveals whether a stable configuration has been reached. It also reveals, at least in part, the stochastic deviations/fluctuations we expect to see in the end by chance. |
the average/median of normalized results from N>>1 runs? On Fri, Apr 10, 2015 at 3:42 PM, Alexander Grayver <notifications@github.com
|
As we know the true Hessian matrix, we know which matrix is correct and, depending on a distance measure of your choice, you can compute which one is better, i.e. closer to the true value. |
…dard deviations + removed approximation of deviations when using vdcma, ref CMA-ES#129
…ettings + full_cov() function that return full size covariance matrix, even with sep and vd algorithms, ref CMA-ES#129
After convergence, I can request covariance matrix from CMASolutions. I wonder, what do these covariances present? Are they geno/pheno type? What do I need to do in order to get the covariance matrix in original parameter units (more precisely, if my parameters have units m, then covariance should be m^2)?
The text was updated successfully, but these errors were encountered: