How should output be structured? #168

liamhuber · 2024-01-02T19:54:19Z

A bunch of the new content since mid-December takes the idea of atomistics.shared.Output and really runs with it, using the following paradigm:

Define a new dataclass (template) to specify what fields are required
Define functions to actually do science
Use the functions to make a new "engine"
Define another new dataclass (actually gets used) and populate its callables with properties of the new engine

This creates an awful lot of code duplication, which is one of the main things the Output class was introduced to fix. It also feels very over-engineered: in many (all?) of the new cases, there is no need for an engine since there is only a single pure-python "engine" defined, and that's all we'll ever need.

In #166 I show how the two dataclass and engine can all be rolled into a single class, so that (when an engine is not truly required), the pattern looks like this:

Define functions to actually do science
Define a new output class

We could, in principle, jam the science functions into the class itself as well, but I am in favour of the current structure where each one is broken out as an individual function for the sake of downstream developers who may want to import them individually.

Then I thought about how we can abstract this. What do we want Output to really promise? IMO we want:

A predefined set of names for collections of meaningful data
- Accessible at the class level!
A helper method to easily grab some subset of this collection

With this in mind, in #167 I made a new, truly abstract Output class to codify and document these requirements, then introduce a new abstract class for cases where we don't need an intermediate engine, and re-named the current Output to EngineOutput and left it functioning as-is.

Code duplication adds to the weight of maintenance, makes it harder for new devs to learn the code base, and increases the chance of a bug going unnoticed; we should aim for abstractions whenever they can be made without introducing significant complexity.

IMO the new PropertyEngine and the pattern of #166 can be applied to the rest of the new, verbose features that got added. But before jamming that through I thought we should take the elastic case as a chance to reflect on what (if any) changes need to be made to the solution in #167, or whether some totally different attack is called for.

The text was updated successfully, but these errors were encountered:

liamhuber · 2024-01-02T20:25:17Z

One of the big problems I see with my proposal in #167 is that it is not easy to make output that combines the engine and property approaches. The existing implementation does this, but at the cost of these very verbose middle-man engines.

I think this lack of being able to combine them is a deal breaker so I closed #167 and #166, but I still want to think on how we can slim things down.

liamhuber · 2024-01-02T21:20:58Z

With the discussion in #169 and #157, I can see that it may indeed be worthwhile plugging in pseudo-"engines" consistently. In #170 I do a bit of work to condense such engines and make them more readable.

In the process of this, and especially with #170 in mind, I realized that we may want to define an output instance that uses different engines for different properties. E.g. perhaps we want to return MD output using a Lammps engine and some sort of elastic info using the ElasticProperties engine in the same data bundle.

A couple different syntaxes come to mind:

# Duplicate our existing input
some_output.get((engine_1, "field_a", "field_f"), (engine_2, "field_b"))

# As pairs
some_output.get(("field_a", engine_1), ("field_f", engine_1), ("field_b", engine_2))

# just the dict version of pairs
some_output.get({"field_a": engine_1, "field_f": engine_1, "field_b": engine_2)

I think I like the first one best as it is most like the existing syntax.

liamhuber assigned jan-janssen, liamhuber and samwaseda Jan 2, 2024

liamhuber mentioned this issue Jan 8, 2024

Improve abstraction for data collections (e.g. Output) #181

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How should output be structured? #168

How should output be structured? #168

liamhuber commented Jan 2, 2024

liamhuber commented Jan 2, 2024

liamhuber commented Jan 2, 2024

How should output be structured? #168

How should output be structured? #168

Comments

liamhuber commented Jan 2, 2024

liamhuber commented Jan 2, 2024

liamhuber commented Jan 2, 2024