remove old files, update README: refactor 2 of 2

containers · Jan 3, 2024 · 70508ba · 70508ba
1 parent a8ffd8c
commit 70508ba
Show file tree

Hide file tree

Showing 9 changed files with 14 additions and 233 deletions.
diff --git a/README.md b/README.md
@@ -1,103 +1,23 @@
 # Locallm
 
-This repo contains the assets required to build and run an application on your Mac that uses a local instance of a large language model (LLM).
+This repo contains artifacts that can be used to build and run LLM (Large Language Model) services locally on your Mac using podman. These containerized LLM services can be used to help developers quickly prototype new LLM based applications, without the need for relying on any other externally hosted services. Since they are already containerized, it also helps developers move from their prototype to production quicker.        
 
-This README outlines three different approaches to running the application:
-* [Pull and Run](#pull-and-run)
-* [Build and Run](#build-and-run)
-* [Deploy on Openshift](#deploy-on-openshift)
+## Current Locallm Services: 
 
+* [Chatbot](#chatbot)
+* [Text Summarization](#text-summarization)
+* [Fine-tuning](#fine-tuning)
 
+### Chatbot
 
-## Pull and Run 
-
-If you have [podman](https://podman-desktop.io/) installed on your Mac and don't want to build anything, you can pull the image directly from my [quay.io](quay.io) repository and run the application locally following the instructions below. 
-
-_Note: You can increase the speed of the LLM's response time by increasing the resources allocated to your podman's virtual machine._ 
-
-### Pull the image from quay. 
-```bash
-podman pull quay.io/michaelclifford/locallm
-```
-### Run the container
-```bash
-podman run -it -p 7860:7860 quay.io/michaelclifford/locallm:latest 
-```
-
-Go to `0.0.0.0:7860` in your browser and start to chat with the LLM. 
+A simple chatbot using the gradio UI. Learn how to build and run this model service here: [Chatbot](/chatbot/).
 
 ![](/assets/app.png)
 
-## Build and Run
-
-If you'd like to customize the application or change the model, you can rebuild and run the application using [podman](https://podman-desktop.io/). 
-
-
-_Note: If you would like to build this repo as is, it expects that you have downloaded this [model](https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/blob/main/llama-2-7b-chat.Q5_K_S.gguf) ([llama-2-7b-chat.Q5_K_S.gguf](https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/blob/main/llama-2-7b-chat.Q5_K_S.gguf)) from huggingface and saved it into the top directory of this repo._ 
-
-### Build the image locally for arm
-
-```bash
-podman build -t locallm . -f arm/Containerfile  
-```
-
-### Run the image
-
-```bash
-podman run -it -p 7860:7860 locallm:latest
-```
-
-Go to `0.0.0.0:7860` in your browser and start to chat with the LLM. 
-
-![](/assets/app.png)
-
-## Deploy on Openshift
-
-Now that we've developed an application locally that leverages an LLM, we likely want to share it with a wider audience. Let's get it off our machine and run it on OpenShift. 
-
-### Rebuild for x86
-We'll need to rebuild the image for the x86 architecture for most use case outside of our Mac. Since this is an AI workload, we will also want to take advantage of Nvidia GPU's available outside our local machine. Therefore, this image's base image contains CUDA and builds llama.cpp specifically for a CUDA environment. 
+### Text Summarization
 
-```bash
-podman build -t locallm:x86 . -f x86/Containerfile
-```
-
- Before building the image, you can change line 6 of `x86/Containerfile` if you'd like to **NOT** use CUDA and GPU acceleration by setting `-DLLAMA_CUBLAS` to `off`  
-
-```Containerfile
-ENV CMAKE_ARGS="-DLLAMA_CUBLAS=off"
-```
-
-### Push to Quay
-
-Once you login to [quay.io](quay.io) you can push your own newly built version of this LLM application to your repository for use by others.  
-
-```bash
-podman login quay.io
-```
-
-```bash
-podman push localhost/locallm quay.io/<YOUR-QUAY_REPO>/locallm
-```
-
-### Deploy
-
-Now that your model lives in a remote repository we can deploy it. Go to your OpenShift developer dashboard and select "+Add" to use the Openshift UI to deploy the application. 
-
-![](/assets/add_image.png)
-
-Select "Container images" 
-
-![](/assets/container_images.png)
-
-Then fill out the form on the Deploy page with your [quay.io](quay.io) image name and make sure to set the "Target port" to 7860.
-
-![](/assets/deploy.png)
-
-Hit "Create" at the bottom and watch your application start.
-
-Once the pods are up and the application is working, navigate to the "Routs" section and click on the link created for you to interact with your app. 
-
-![](/assets/app.png)
+An LLM app that can summarize arbitrarily long text inputs. Learn how to build and run this model service here: [Text Summarization](/summarizer/).
 
+### Fine Tuning 
 
+This application allows a user to select a model and a data set they'd like to fine-tune that model on. Once the application finishes, it outputs a new fine-tuned model for the user to apply to other LLM services. Learn how to build and run this model training job here: [Fine-tuning](/finetune/).
diff --git a/arm/Containerfile b/arm/Containerfile
diff --git a/chatbot/model_services/builds/arm/Containerfile b/chatbot/model_services/builds/arm/Containerfile
@@ -4,7 +4,7 @@ COPY builds/requirements.txt /locallm/requirements.txt
 RUN pip install --upgrade pip
 RUN pip install --no-cache-dir --upgrade -r /locallm/requirements.txt
 ENV MODEL_FILE=llama-2-7b-chat.Q5_K_S.gguf
-COPY builds/${MODEL_FILE} /locallm/
+COPY builds/${MODEL_FILE} /locallm/models/
 COPY builds/src/ /locallm
 COPY chat_service.py /locallm/chat_service.py
 ENTRYPOINT [ "python", "chat_service.py" ]
diff --git a/chatbot/model_services/builds/x86/Containerfile b/chatbot/model_services/builds/x86/Containerfile
@@ -6,7 +6,7 @@ ENV CMAKE_ARGS="-DLLAMA_CUBLAS=on"
 ENV FORCE_CMAKE=1
 RUN pip install --upgrade --force-reinstall --no-cache-dir -r /locallm/requirements.txt
 ENV MODEL_FILE=llama-2-7b-chat.Q5_K_S.gguf
-COPY builds/${MODEL_FILE} /locallm/
+COPY builds/${MODEL_FILE} /locallm/models/
 COPY builds/src/ /locallm
 COPY chat_service.py /locallm/chat_service.py
 ENTRYPOINT [ "python", "chat_service.py" ]
diff --git a/chatbot/model_services/chat_service.py b/chatbot/model_services/chat_service.py
@@ -5,7 +5,7 @@
 from llamacpp_utils import clip_history
 
 
-llm = Llama("llama-2-7b-chat.Q5_K_S.gguf",
+llm = Llama("models/llama-2-7b-chat.Q5_K_S.gguf",
             n_gpu_layers=-1,
             n_ctx=2048,
             max_tokens=512,

diff --git a/src/app.py b/src/app.py
diff --git a/src/chat.py b/src/chat.py
diff --git a/src/run_locallm.py b/src/run_locallm.py
diff --git a/x86/Containerfile b/x86/Containerfile