containers · MichaelClifford · Feb 2, 2024 · Feb 2, 2024 · Feb 2, 2024
@@ -7,9 +7,12 @@ podman build -t stchat . -f builds/Containerfile
 ```
 ### Run image locally
 
-Make sure your model service is up and running before starting this container image. 
+Make sure the playground model service is up and running before starting this container image. 
+To start the model service, refer to [the playground document](../playground/README.md)
 
 
 ```bash
-podman run -it -p 8501:8501 -e MODEL_SERVICE_ENDPOINT=http://10.88.0.1:8001/v1 stchat   
+podman run --rm -it -p 8501:8501 -e MODEL_SERVICE_ENDPOINT=http://10.88.0.1:8001/v1 stchat   
 ```
+
+Interact with the application from your local browser at `localhost:8501`
@@ -26,7 +26,7 @@ This also assumes that `<location/of/your/data/>` contains the following 2 files
 ### Run the image
 
 ```bash
-podman run -it -v <location/of/your/data/>:/locallm/data/ finetunellm
+podman run --rm -it -v <location/of/your/data/>:/locallm/data/ finetunellm
 ```
 This will run 10 iterations of LoRA finetuning and generate a new model that can be exported and used in another chat application. I'll caution that 10 iterations is likely insufficient to see a real change in the model outputs, but it serves here for demo purposes.  
 
@@ -56,4 +56,4 @@ podman run -it -v <location/of/your/data/>:/locallm/data/ \
 -e NEW_MODEL=<name-of-new-finetuned-model.gguf> 
  finetunellm
 
-```
+```
@@ -0,0 +1,36 @@
+### Build Model Service
+
+From this directory,
+
+```bash
+podman build -t playground:image .
+```
+
+### Download Model
+
+At the time of this writing, 2 models are known to work with this service
+
+- **Llama2-7b**
+    - Download URL: [https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/resolve/main/llama-2-7b-chat.Q5_K_S.gguf](https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/resolve/main/llama-2-7b-chat.Q5_K_S.gguf)
+- **Mistral-7b**
+    - Download URL: [https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GGUF/resolve/main/mistral-7b-instruct-v0.1.Q4_K_S.gguf](https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GGUF/resolve/main/mistral-7b-instruct-v0.1.Q4_K_S.gguf)
+
+```bash
+cd ../models
+wget <Download URL>
+cd ../
+```
+
+### Deploy Model Service
+
+Deploy the LLM server and volume mount the model of choice.
+
+```bash
+podman run --rm -it -d \
+        -p 8001:8001 \
+        -v Local/path/to/locallm/models:/locallm/models:ro,Z \
+        -e MODEL_PATH=models/<model-filename> \
+        -e HOST=0.0.0.0 \
+        -e PORT=8001 \
+        playground:image`
+```
@@ -8,12 +8,11 @@ This example will deploy a local RAG application using a chromadb server, a llam
 Use the existing ChromaDB image to deploy a vector store service.
 
 * `podman pull chromadb/chroma`
-* `podman run -it -p 8000:8000 chroma`
+* `podman run --rm -it -p 8000:8000 chroma`
 
-### Deploy Model Service 
+### Deploy Model Service
 
-Deploy the LLM server and volume mount the model of choice.
-* `podman run -it -p 8001:8001 -v Local/path/to/locallm/models:/locallm/models:Z -e MODEL_PATH=models/llama-2-7b-chat.Q5_K_S.gguf -e HOST=0.0.0.0 -e PORT=8001 playground`
+To start the model service, refer to [the playground model-service document](../playground/README.md)
 
 ### Build and Deploy RAG app
 Deploy a small application that can populate the data base from the vectorDB and generate a response with the LLM.
@@ -31,5 +30,5 @@ snapshot_download(repo_id="BAAI/bge-base-en-v1.5",
 Follow the instructions below to build you container image and run it locally. 
 
 * `podman build -t ragapp rag-langchain -f rag-langchain/builds/Containerfile`
-* `podman run -it -p 8501:8501 -v Local/path/to/locallm/models/:/rag/models:Z -v Local/path/to/locallm/data:/rag/data:Z ragapp -- -H 10.88.0.1 -m http://10.88.0.1:8001/v1`
+* `podman run --rm -it -p 8501:8501 -v Local/path/to/locallm/models/:/rag/models:Z -v Local/path/to/locallm/data:/rag/data:Z ragapp -- -H 10.88.0.1 -m http://10.88.0.1:8001/v1`
 
@@ -40,7 +40,7 @@ The user should provide the model name, the architecture and image name they wan
 Once the model service image is built, it can be run with the following:
 
 ```bash
-podman run -it -p 7860:7860 summarizer
+podman run --rm -it -p 7860:7860 summarizer
 ```
 ### Interact with the app