Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How should output be structured? #168

Open
liamhuber opened this issue Jan 2, 2024 · 2 comments
Open

How should output be structured? #168

liamhuber opened this issue Jan 2, 2024 · 2 comments
Assignees

Comments

@liamhuber
Copy link
Member

A bunch of the new content since mid-December takes the idea of atomistics.shared.Output and really runs with it, using the following paradigm:

  • Define a new dataclass (template) to specify what fields are required
  • Define functions to actually do science
  • Use the functions to make a new "engine"
  • Define another new dataclass (actually gets used) and populate its callables with properties of the new engine

This creates an awful lot of code duplication, which is one of the main things the Output class was introduced to fix. It also feels very over-engineered: in many (all?) of the new cases, there is no need for an engine since there is only a single pure-python "engine" defined, and that's all we'll ever need.

In #166 I show how the two dataclass and engine can all be rolled into a single class, so that (when an engine is not truly required), the pattern looks like this:

  • Define functions to actually do science
  • Define a new output class

We could, in principle, jam the science functions into the class itself as well, but I am in favour of the current structure where each one is broken out as an individual function for the sake of downstream developers who may want to import them individually.

Then I thought about how we can abstract this. What do we want Output to really promise? IMO we want:

  • A predefined set of names for collections of meaningful data
    • Accessible at the class level!
  • A helper method to easily grab some subset of this collection

With this in mind, in #167 I made a new, truly abstract Output class to codify and document these requirements, then introduce a new abstract class for cases where we don't need an intermediate engine, and re-named the current Output to EngineOutput and left it functioning as-is.

Code duplication adds to the weight of maintenance, makes it harder for new devs to learn the code base, and increases the chance of a bug going unnoticed; we should aim for abstractions whenever they can be made without introducing significant complexity.

IMO the new PropertyEngine and the pattern of #166 can be applied to the rest of the new, verbose features that got added. But before jamming that through I thought we should take the elastic case as a chance to reflect on what (if any) changes need to be made to the solution in #167, or whether some totally different attack is called for.

@liamhuber
Copy link
Member Author

One of the big problems I see with my proposal in #167 is that it is not easy to make output that combines the engine and property approaches. The existing implementation does this, but at the cost of these very verbose middle-man engines.

I think this lack of being able to combine them is a deal breaker so I closed #167 and #166, but I still want to think on how we can slim things down.

@liamhuber
Copy link
Member Author

With the discussion in #169 and #157, I can see that it may indeed be worthwhile plugging in pseudo-"engines" consistently. In #170 I do a bit of work to condense such engines and make them more readable.

In the process of this, and especially with #170 in mind, I realized that we may want to define an output instance that uses different engines for different properties. E.g. perhaps we want to return MD output using a Lammps engine and some sort of elastic info using the ElasticProperties engine in the same data bundle.

A couple different syntaxes come to mind:

# Duplicate our existing input
some_output.get((engine_1, "field_a", "field_f"), (engine_2, "field_b"))
# As pairs
some_output.get(("field_a", engine_1), ("field_f", engine_1), ("field_b", engine_2))
# just the dict version of pairs
some_output.get({"field_a": engine_1, "field_f": engine_1, "field_b": engine_2)

I think I like the first one best as it is most like the existing syntax.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants