Description
Solves support for
- local LLMs
- hardware
- gpu acceleration
- custom and more LLM architectures
Describe the solution you'd like
Support for applications that adhere to the OpenAI API.
OpenAI API has become an inofficial standard.
Progress:
- Llama.cpp
- JabRef documentation
- Documentation: https://github.com/ggerganov/llama.cpp/blob/master/examples/server/README.md
- GPT4All
- Support GPT4All Server API #11870
- Send embeddings to GPT4All's Local API Server #12114
- system prompt is ignored in API request nomic-ai/gpt4all#1855
- Chat UI Server: Support listening on address other than localhost nomic-ai/gpt4all#1304
- Local server not remembering previous messages nomic-ai/gpt4all#2602
- Convert API server to use OpenAPI spec nomic-ai/gpt4all#3113
- JabRef documentation
- Documentation: https://github.com/nomic-ai/gpt4all/wiki/Local-API-Server
- LMStudio
- JabRef documentation
- Documentation: https://lmstudio.ai/docs/basics/server
- Ollama
- JabRef documentation: https://docs.jabref.org/ai/local-llm#step-by-step-guide-for-ollama
- Documentation: https://github.com/ollama/ollama/blob/main/docs/api.md
- Jan
- JabRef documentation
- Documentation: https://jan.ai/integrations/coding/continue-dev#step-2-enable-the-jan-api-server. I think they are using https://github.com/janhq/cortex.cpp?tab=readme-ov-file#overview
- KoboldCPP
- JabRef documentation
- Documentation: https://lite.koboldai.net/koboldcpp_api
- Llamafile
- JabRef documentation
- Documentation: https://github.com/Mozilla-Ocho/llamafile?tab=readme-ov-file#quickstart
Most of the applications listed here are a wrapper around llama.cpp, though they all have their unique strenghts and weaknesses. Except for LMStudio, they are all open source.
Related issues:
Notes:
This issue here is purely about LLMs. Not embedding models. For embedding models see InAnYan#85 (comment).