Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Structure metadata and additional properties in output documents #502

Closed
gpetretto opened this issue Sep 1, 2023 · 8 comments
Closed

Structure metadata and additional properties in output documents #502

gpetretto opened this issue Sep 1, 2023 · 8 comments

Comments

@gpetretto
Copy link
Contributor

Flows like the ones calculating elastic tensors or phonons usually produce a final document containing the parsed results but no default metadata about the structure or how the calculation was performed. This makes somewhat difficult to retrieve the outputs, as one does not have any obvious way to filter them. I think it would be usuful to convert the output documents (e.g. ElasticDocument and PhononBSDOSDoc ) to be subclasses of StructureMetadata. I don't think this will add much data to the DB, but could be useful.

Another related point is the fact that the final document does not contain the input parameters used to perform the simulation. To retrieve that information one would first need to match the final output to the perturbation jobs and extract the information from there. While this is not too complex, it might be still be a bit involved for a casual user. It could thus be interesting to store the input parameters of one of the main steps of the flow in the final document (e.g. one of the perturbations for elastic or phonon flow). However this would be code dependent and not necessarily representative of all the steps of the flow, so it may be less straightforward than the structure metadata.

If any of these options seem acceptable I can try to work on it.

@utf
Copy link
Member

utf commented Sep 1, 2023

I think making those documents a subclass of StructureMetadata makes sense.

Personally, I'm less keen on adding calculation parameters for the reason that you mentioned - these are code specific so creating a general schema will be difficult and could change in the future, e.g., with ML force fields. Both the ElasticDocument and PhononBSDOSDoc store the uuids of the calculations used to generate them. E.g., ElasticDocument.fitting_data.uuids and PhononBSDOSDoc.uuids. So you shouldn't need to do any matching. It is just one extra query to get those parameters.

@JaGeo
Copy link
Member

JaGeo commented Sep 1, 2023

Maybe,however, we could add information to the documentation on how to retrieve the static computations belonging to an elastic or phonon computation? It might help clarifying this for all users.

@utf
Copy link
Member

utf commented Sep 1, 2023

Agreed, that would be very useful.

@gpetretto
Copy link
Contributor Author

Thanks for the feedback. Indeed I was using a needlessly complicated approach to query the input parameters.
I will open a PR.

@mkhorton
Copy link
Member

mkhorton commented Sep 1, 2023

+1 for StructureMetadata, in fact I believe the ElasticDoc in emmet already does this via sub-classing PropertyDoc, sub-classing StructureMetadata. I don't recall specific details of PropertyDoc but I do like that it enforces some level of standardization between documents for different properties.

I strongly support keeping calculation details out of the document model, except maybe in a very abstract way (e.g. something like an originating task_id key, or a generic method field or similar).

In general, I'm observing that it's very difficult to stop a proliferation of document models. There's already another elastic property document model being proposed (tagging @janosh) for ML-generated elastic properties, despite the fact the property itself is the same. We have multiple document models for the same task in different repos, or even within the same repo etc.

What I'm proposing/asking for is that we adopt a very disciplined approach for keeping document models focused and consistent, e.g. one and only one model per property, and thinking very carefully about what fields to include and how to define units etc. In the long term I think this will greatly benefit us (e.g. allowing analysis that combines property data generated via multiple methods), and avoid very costly database migrations (e.g. if a document model changes in a breaking way).

@janosh
Copy link
Member

janosh commented Sep 1, 2023

In general, I'm observing that it's very difficult to stop a proliferation of document models. There's already another elastic property document model being proposed (tagging @janosh) for ML-generated elastic properties, despite the fact the property itself is the same. We have multiple document models for the same task in different repos, or even within the same repo etc.

I strongly agree. Also, very timely observation that applies more generally across the MP stack considering the current discussion to deduplicate pymatgen vs atomate2 input sets. 😄

@JaGeo
Copy link
Member

JaGeo commented Sep 1, 2023

Btw, same applies probably to phonons as well. Also outputs of workflows to compute phonons with forcefields.

@utf
Copy link
Member

utf commented Sep 4, 2023

Fixed in #514

@utf utf closed this as completed Sep 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants