Structure metadata and additional properties in output documents #502

gpetretto · 2023-09-01T07:27:25Z

Flows like the ones calculating elastic tensors or phonons usually produce a final document containing the parsed results but no default metadata about the structure or how the calculation was performed. This makes somewhat difficult to retrieve the outputs, as one does not have any obvious way to filter them. I think it would be usuful to convert the output documents (e.g. ElasticDocument and PhononBSDOSDoc ) to be subclasses of StructureMetadata. I don't think this will add much data to the DB, but could be useful.

Another related point is the fact that the final document does not contain the input parameters used to perform the simulation. To retrieve that information one would first need to match the final output to the perturbation jobs and extract the information from there. While this is not too complex, it might be still be a bit involved for a casual user. It could thus be interesting to store the input parameters of one of the main steps of the flow in the final document (e.g. one of the perturbations for elastic or phonon flow). However this would be code dependent and not necessarily representative of all the steps of the flow, so it may be less straightforward than the structure metadata.

If any of these options seem acceptable I can try to work on it.

The text was updated successfully, but these errors were encountered:

utf · 2023-09-01T08:22:35Z

I think making those documents a subclass of StructureMetadata makes sense.

Personally, I'm less keen on adding calculation parameters for the reason that you mentioned - these are code specific so creating a general schema will be difficult and could change in the future, e.g., with ML force fields. Both the ElasticDocument and PhononBSDOSDoc store the uuids of the calculations used to generate them. E.g., ElasticDocument.fitting_data.uuids and PhononBSDOSDoc.uuids. So you shouldn't need to do any matching. It is just one extra query to get those parameters.

JaGeo · 2023-09-01T08:37:05Z

Maybe,however, we could add information to the documentation on how to retrieve the static computations belonging to an elastic or phonon computation? It might help clarifying this for all users.

utf · 2023-09-01T09:10:34Z

Agreed, that would be very useful.

gpetretto · 2023-09-01T13:25:34Z

Thanks for the feedback. Indeed I was using a needlessly complicated approach to query the input parameters.
I will open a PR.

mkhorton · 2023-09-01T15:51:10Z

+1 for StructureMetadata, in fact I believe the ElasticDoc in emmet already does this via sub-classing PropertyDoc, sub-classing StructureMetadata. I don't recall specific details of PropertyDoc but I do like that it enforces some level of standardization between documents for different properties.

I strongly support keeping calculation details out of the document model, except maybe in a very abstract way (e.g. something like an originating task_id key, or a generic method field or similar).

In general, I'm observing that it's very difficult to stop a proliferation of document models. There's already another elastic property document model being proposed (tagging @janosh) for ML-generated elastic properties, despite the fact the property itself is the same. We have multiple document models for the same task in different repos, or even within the same repo etc.

What I'm proposing/asking for is that we adopt a very disciplined approach for keeping document models focused and consistent, e.g. one and only one model per property, and thinking very carefully about what fields to include and how to define units etc. In the long term I think this will greatly benefit us (e.g. allowing analysis that combines property data generated via multiple methods), and avoid very costly database migrations (e.g. if a document model changes in a breaking way).

janosh · 2023-09-01T16:00:01Z

In general, I'm observing that it's very difficult to stop a proliferation of document models. There's already another elastic property document model being proposed (tagging @janosh) for ML-generated elastic properties, despite the fact the property itself is the same. We have multiple document models for the same task in different repos, or even within the same repo etc.

I strongly agree. Also, very timely observation that applies more generally across the MP stack considering the current discussion to deduplicate pymatgen vs atomate2 input sets. 😄

JaGeo · 2023-09-01T17:13:53Z

Btw, same applies probably to phonons as well. Also outputs of workflows to compute phonons with forcefields.

utf · 2023-09-04T16:39:26Z

Fixed in #514

gpetretto mentioned this issue Sep 4, 2023

Add StructureMetadata as baseclass for output documents #514

Merged

utf closed this as completed Sep 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Structure metadata and additional properties in output documents #502

Structure metadata and additional properties in output documents #502

gpetretto commented Sep 1, 2023

utf commented Sep 1, 2023 •

edited

Loading

JaGeo commented Sep 1, 2023

utf commented Sep 1, 2023

gpetretto commented Sep 1, 2023

mkhorton commented Sep 1, 2023

janosh commented Sep 1, 2023

JaGeo commented Sep 1, 2023 •

edited

Loading

utf commented Sep 4, 2023

Structure metadata and additional properties in output documents #502

Structure metadata and additional properties in output documents #502

Comments

gpetretto commented Sep 1, 2023

utf commented Sep 1, 2023 • edited Loading

JaGeo commented Sep 1, 2023

utf commented Sep 1, 2023

gpetretto commented Sep 1, 2023

mkhorton commented Sep 1, 2023

janosh commented Sep 1, 2023

JaGeo commented Sep 1, 2023 • edited Loading

utf commented Sep 4, 2023

utf commented Sep 1, 2023 •

edited

Loading

JaGeo commented Sep 1, 2023 •

edited

Loading