Skip to content

Improve handling of JabRef embedding models #12240

@InAnYan

Description

@InAnYan

Is your suggestion for improvement related to a problem? Please describe.

Currently JabRef provides means to view and download embedding models from DJL ModelZoo. JabRef also stores the size of embedding models.

However, it is badly implemented (another black page of my GSoC project).

Problems are:

  1. This list is not auto-updated. Actually, this list is ... hard-coded.
  2. Model size in this list is not properly calculated.
  3. There is no way to view what models are already downloaded.
  4. There is no way to delete old or unused models.
  5. Embedding models access Internet without agreement (agreement on using AI != any Internet connection). There should be a way to download model on 1 computer and then transfer it to another computer. The question is: what to download? Where is it stored? Which files to transfer? Where to put in JabRef?

Describe the solution you'd like

  1. Provide a list of available models using actual DJL API (up-to-date list).
  2. Add the ability to download a model.
  3. Add the ability to select model for using in AI features.
  4. Provide a way to list downloaded models.
  5. Provide the ability to delete a downloaded model.

Additionally, there should be a way to download a model beforehand. E.g. download model on one computer, then transfer it to another and install in JabRef.

Additional context

It seems there are some useful methods in DJL, though they are not documented thoroughly (https://javadoc.io/doc/ai.djl/api/latest/ai/djl/repository/zoo/ModelZoo.html#listModels()). I couldn't quickly grasp how to connect local (downloaded) models + remote, but probably this is a problem of time.

Thi Lo also found a link with models metadata (https://mlrepo.djl.ai/model/nlp/text_embedding/ai/djl/huggingface/pytorch/models.json.gz), which is enough to have.

This is not an easy issue, one needs to create useful UI. However, it's not debatable, so I posted it here.

Maybe introduce a section in AI preferences "Available models" with button "+", button "+" opens a dialog for choosing a remote embedding model or a local one

Metadata

Metadata

Assignees

No one assigned

    Projects

    Status

    Normal priority

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions