Skip to content

Releases: abgulati/LARS

v2.0-beta6: Major HF-Waitress LLM Server Update

10 Sep 22:44
Compare
Choose a tag to compare
  • HF-Waitress: /completions_stream now implements custom TextStreamer so as to redirect only it's output to the stream buffer, while STDOUT remains unmodified thus allowing other non-blocked routes and methods to execute and output to STDOUT in parallel without interfering with the stream

  • CSS separated into a dedicate file

  • Minor QoL changes

Full Changelog: v2.0-beta5...v2.0-beta6

v2.0-beta5: UI Enhancements

09 Sep 22:19
Compare
Choose a tag to compare
  • New font-family, glassmorphism and title bar

Full Changelog: v2.0-beta4...v2.0-beta5

v2.0-beta4: HQQ Fix and Minor Refinements

06 Sep 23:08
Compare
Choose a tag to compare
  • BUG FIX: HQQ quantization would error out if torch.dtype (dataType) was set to auto, it now force-sets to torch.bfloat16

  • BUG FIX: Add new LLM button re-displays when the HF-Waitress LLM list is closed and re-opened

  • Minor response-formatting adjustment

Full Changelog: v2.0-beta3...v2.0-beta4

v2.0-beta3

06 Sep 00:17
Compare
Choose a tag to compare
  • Fixed HF-Waitress streaming-response formatting!

  • Improved app load times from tuned server health-check intervals

  • Minor performance improvement to HF-Waitress streaming-output

  • Minor refinements to HF-Waitress server status outputs

Full Changelog: v2.0-beta2...v2.0-beta3

v2.0-beta2: Enhanced HF-Waitress LLM Management Features, Error-Reporting Refinements and Bug Fixes

05 Sep 00:25
Compare
Choose a tag to compare
  1. Enhanced HF-Waitress LLM Management: Add new model_ids, search-filter & sort the list of LLMs as well as delete LLM IDs from the HF-Waitress LLM dropdown list
  2. HF-Waitress server health-check reporting improvements
  3. Various bug fixes: Reference to index_dir removed, document_records SQL-DB correctly created on very first run, removed troublesome test-prints during document-chunking operation

Full Changelog: v2.0-beta1...v2.0-beta2

v2.0-beta1: New LLM Server -- HF-Waitress!

30 Aug 01:11
Compare
Choose a tag to compare

HF-Waitress is a powerful and flexible server application for deploying and interacting with HuggingFace Transformer models. It simplifies the process of running open-source Large Language Models (LLMs) locally on-device, addressing common pain points in model deployment and usage.

This server enables loading HF-Transformer & AWQ-quantized models directly off the hub, while providing on-the-fly quantization via BitsAndBytes, HQQ and Quanto for the former. It negates the need to manually download any model yourself, simply working off the models name instead. It requires no setup, and provides concurrency and streaming responses all from within a single, easily-portable, platform-agnostic Python script.

For a full list of features see: https://github.com/abgulati/hf-waitress

LARS is far easier to deploy and get working on the very first run without requiring the user to manually download and place their LLMs.

Check out the updated Dependencies, Installation and Usage Instructions in the README

Note containers are not yet updated and will be done so in the following week most likely.

Full Changelog: v1.9.1...v2.0-beta1

v1.9.1 - Re-ranker Robustness & Minor UI Tweak

21 Aug 20:48
Compare
Choose a tag to compare
  • BUG FIX: Re-ranking bypassed when do_rag=False - error no longer produced due to empty document list!
  • Minor UI change: Adjusted max-width of Settings modal to 75% for better use of available screenspace

Full Changelog: v1.9...v1.9.1

v1.9 - Vector Re-Ranking & No More Whoosh

21 Aug 01:52
Compare
Choose a tag to compare
  1. Custom document chunker appends page number data as metadata to chunks stored vectorDB
  2. LLM can now supply specific document names and page numbers within the response itself!
  3. Re-ranking and filtering applied via SentenceTransformer('all-MiniLM-L6-v2') to the vectorDB similarity search results for better contextual accuracy
  4. Whoosh indexing no longer necessary - far simplified book-keeping and no overhead for page-number searches at inference time
  5. Page number accuracy significantly increased as a result of all the above
  6. Default system-prompt template now instructs the LLM to include document names and page numbers whenever additional context is provided, actual output dependent on ability of the specific LLM used
  7. BUG FIX: PDF tabs in documnet-viewer in the response window did not open properly for consequetive questions and on chat-history load. FIXED.

Full Changelog: v1.8...v1.9

v1.8

15 Aug 23:41
Compare
Choose a tag to compare

MAJOR UPDATE:

  1. Google Drive Integration complete! Downloads files and folders recursively. Filtering, sorting and queued-loading of Google Drive docs is now available via the UI
  2. Improved highlighting: Implemented fuzzy-search logic, replacing exact matching, resulting in expanded highlighting on pages
  3. Improved RAG: Increased cosine similarity seach threshold to 80% for more stringent and accurate matching and passing sources data to the LLM for improved response quality
  4. Imporved handling of images for citations - skipping image extraction of scanned docs
  5. Clearer document naming in citations: The unique ID of the highlighted dodcument is no longer attached to the document name in the 'Refer to the following documents' citations block
  6. BUG FIX: When using the free-tier of the AzureCV OCR service, it will handle UsageLimitExceeded errors even when submitting multiple documents back-to-back, auto-waiting and resuming correctly
  7. BUG FIX: handle_api_error events will now actually return to the front-end!
  8. Refactored process_new_file method into smaller blocks that are now shared with the GoogleDrive loader and can be used by other integrations in the future too
  9. Increased chunk size to 500 and removed '250' from the name of the SBERT VectorDB created
  10. Cleaned up print and newline statements
  11. Improvements to accuracy and relevance of page numbers and doc names cited in response, further refinements on-going
  12. Replaced Whoosh indexing search opearator from the default AND to OR
  13. HF-Waitress local-LLM server integration begins!

Full Changelog: v1.7...v1.8

v1.7

08 Aug 21:54
Compare
Choose a tag to compare

New models supported - Google Gemma2, DeepSeek V2, Llama-3.1

Revamped Docker builds - new dockerfiles

Pre-built images shared

Various bug-fixes and enhancements

Full Changelog: v1.6...v1.7