Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support terminology codes #106

Open
wants to merge 24 commits into
base: main
Choose a base branch
from
Open

feat: support terminology codes #106

wants to merge 24 commits into from

Conversation

cmdoret
Copy link
Member

@cmdoret cmdoret commented Oct 1, 2024

Summary

This adds support for terminology codes instead of free text for specific metadata fields, along with autocomplete suggestion in the terminal.

Currently, the following properties/terminologies are used:

Major changes

  • Refactor CLI code: move prompt utilities to dedicated module (modos.prompt)
  • Add code-matching module (modos.codes)
  • Implement CodeMatcher protocol with two members (remote/local)
    • Remote is used if an endpoint is provided (completion runs on server, faster)
    • Local used as fallback (ontology downloaded and runs on client machine)
  • fuzon-http service added as a service in modos server deployment
  • pyfuzon added as extra dependency for local code matching

Trying it

To test local autocomplete in terminal:

modos create data/example
modos add data/example sample

To rely on the server for autocomplete:

make deploy
modos --endpoint=http://localhost create data/example
modos --endpoint=http://localhost add data/example sample

Notes

Codes are recommended based similarity between user input and labels, but only the URIs are persisted in metadata.

Follow up (separate issues):

  • speed up download when using local completer
    • terminology caching (+async download in background)

Open questions

When creating a modos from input yaml (instead of interactively) (see data/ex_config.yaml), URIs are now required for the 3 properties above.

It may be painful for users to find out what URIs to input in the yaml. Should we provide some kind of subcommand just to get the codes (basically a fuzon wrapper)?
Perhaps something along the lines of this

# modos codes <property> <query>
modos codes cell_type "red blood cell"

@cmdoret cmdoret self-assigned this Oct 1, 2024
@cmdoret cmdoret linked an issue Oct 1, 2024 that may be closed by this pull request

RUN apt-get update && apt-get install -y git

RUN git clone https://github.com/sdsc-ordes/fuzon.git --branch feat/fuzon-http .
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: clone release instead of branch, once PR is merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature request]: Support terminology codes
1 participant