Skip to content

Commit 6280e28

Browse files
authored
Refactors custom apps (#18)
1 parent 7a292d2 commit 6280e28

27 files changed

+645
-797
lines changed

README.rst

Lines changed: 8 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,7 @@ Once you have Docker, Docker Compose, and Python3 installed, you can download an
6464
6565
git clone https://github.com/modulos/data_copilot.git
6666
cd data_copilot
67-
pip install -r requirements_dev.txt
67+
pip install ".[dev]"
6868
make setup
6969
7070
**Open Data Copilot in your browser: http://localhost:80**
@@ -210,18 +210,14 @@ This command will start the Data Copilot service in development mode. Now, whene
210210

211211
Data Copilot is not just a standalone application, but also a framework that you can use to build your own data processing and analysis tools. Here are the steps to get started:
212212

213-
1. **Worker Logic:** The worker logic can be found in the `celery_app/apps` directory. You can modify the logic here to suit your specific needs.
214-
215-
2. **Getting Started Example:** For a basic understanding of the worker logic, you can refer to the `celery_app/apps/getting_started_example.py` file. This file provides a simple example that can serve as a starting point for your custom logic.
216-
217-
3. **Executor Logic:** The executor logic is contained in the `celery_app/executors/getting_started_executor.py` file. You can modify this file to customize how tasks are executed.
218-
219-
4. **Supported File Types:** If you want to change the supported file types (e.g., extend support to PDF), you will need to configure this on the backend side in the `backend/config/config.py` file. Additionally, you need to implement the logic for handling the new file type in the `backend/routers/artifacts.py` file.
220-
221-
5. **File Type Interaction:** Once you've configured the backend to support the new file type, you'll need to implement the specific logic for interacting with that file type on the worker side.
222-
223-
6. **Return Types:** Currently, Data Copilot is configured to only return tables to the user. However, the framework supports other return types such as heatmaps, histograms, and barplots. You can see the implementation details for these types in the `getting_started_executor.py` file.
213+
1. **Logic:** All the logic of your app should be in the `data_copilot/execution_apps/apps` directory. You can modify the logic here to suit your specific needs. You can inherit from the `data_copilot.exection_apps.base.DataCopilotApp` class. You need to implement at least the three static methods:
214+
- `supported_file_types`: This method should return a dict of the supported file types. The keys should be the identifier and the value the content-type of the file type.
215+
- `process_data_upload`: This method gets a list of the FastAPI `UploadFile` objects and should return a list of dict where the key is the file name and the value the content of the file as BufferedIOBase
216+
- `execute_message`: This method contnains the execution logic of your app, which gets executed on the worker.
224217

218+
2. **Message** The execution_message should return a `data_copilot.execution_apps.helpers.Message` object.
219+
220+
225221
With these steps, you can customize Data Copilot to handle your specific data processing and analysis tasks. Remember to thoroughly test your changes to ensure they work as expected.
226222

227223

data_copilot/backend/config/config.py

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -19,13 +19,11 @@ class Config(BaseSettings):
1919
"application/vnd.ms-excel": "xls",
2020
"application/vnd.openxmlformats-officedocument.spreadsheetml.sheet": "xlsx",
2121
}
22+
COMPUTE_BACKEND: str = Field(..., env="COMPUTE_BACKEND")
2223

2324
if os.getenv("ENVIRONMENT") != "TEST":
2425
DB_CONNECTION_STRING: str = Field(..., env="DB_CONNECTION_STRING")
2526
SECRET_KEY: str = Field(..., env="JWT_SECRET_KEY")
26-
AZURE_STORAGE_ACCOUNT_NAME: str = Field(..., env="AZURE_STORAGE_ACCOUNT_NAME")
27-
AZURE_STORAGE_ACCOUNT_KEY: str = Field(..., env="AZURE_STORAGE_ACCOUNT_KEY")
28-
CONTAINER_NAME: str = Field(..., env="CONTAINER_NAME")
2927
STORAGE_BACKEND: str = Field(..., env="STORAGE_BACKEND")
3028
BACKEND_HOST: str = Field(..., env="BACKEND_HOST")
3129

@@ -35,9 +33,14 @@ class Config(BaseSettings):
3533
secret_key = "c86d3444a380bb36cca7abe1b6dcc8caaee0ecf5bbe254c5473783d147ebc12e"
3634
SECRET_KEY: str = secret_key
3735
BACKEND_HOST: str = "http://localhost:8000/api"
38-
STORAGE_BACKEND: str = "./artifacts/"
36+
STORAGE_BACKEND: str = "file://./artifacts/"
3937
CELERY_BROKER_URL: str = ""
4038

39+
if "dfs.core.windows.net" in os.getenv("STORAGE_BACKEND"):
40+
AZURE_STORAGE_ACCOUNT_NAME: str = Field(..., env="AZURE_STORAGE_ACCOUNT_NAME")
41+
AZURE_STORAGE_ACCOUNT_KEY: str = Field(..., env="AZURE_STORAGE_ACCOUNT_KEY")
42+
CONTAINER_NAME: str = Field(..., env="CONTAINER_NAME")
43+
4144
@property
4245
def POSTGRES_CONNECTION(self):
4346
return self.DB_CONNECTION_STRING

data_copilot/backend/routers/artifacts.py

Lines changed: 6 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,6 @@
1-
import json
2-
import logging
31
import os
42
from typing import Optional
53

6-
import pandas as pd
74
from fastapi import Depends, HTTPException, UploadFile, routing
85

96
from data_copilot.backend.artifacts.artifact import CreateArtifactVersionCM
@@ -37,6 +34,8 @@
3734
)
3835
from data_copilot.backend.schemas.authentication import User
3936
from data_copilot.storage_handler import get_signed_download_url, list_files
37+
from data_copilot.execution_apps import get_app
38+
4039

4140
artifacts_router = routing.APIRouter(prefix="/artifacts")
4241

@@ -321,45 +320,15 @@ async def post_artifacts_id_artifactid_versions(
321320
status_code=400, detail="Only dataset artifact versions can be uploaded"
322321
)
323322

324-
if file.content_type not in CONFIG.ALLOWED_ARTIFACTS_CONTENT_TYPES.keys():
323+
if file.content_type not in get_app().supported_file_types.keys():
325324
raise routing.HTTPException(
326325
status_code=415, detail=f"File format '{file.content_type}' not allowed"
327326
)
328327

329-
try:
330-
file_type = CONFIG.ALLOWED_ARTIFACTS_CONTENT_TYPES[file.content_type]
331-
match file_type:
332-
case "csv":
333-
data_frame = pd.read_csv(file.file, sep=None, encoding="utf-8-sig")
334-
case "xls" | "xlsx":
335-
data_frame = pd.read_excel(await file.read(), dtype={"dteday": str})
336-
337-
if data_frame.empty:
338-
raise routing.HTTPException(
339-
status_code=400, detail=f"Wrong '{file.file}'content"
340-
)
341-
342-
except Exception as e:
343-
logging.error(e)
344-
raise routing.HTTPException(
345-
status_code=500, detail=f"Loading '{file.content_type}' failed"
346-
)
347-
schema = {col: str(data_frame[col].dtype) for col in data_frame.columns}
348-
file_config = {
349-
"file_name": file.filename,
350-
"file_type": file_type,
351-
"file_schema": schema,
352-
"rows": data_frame.shape[0],
353-
}
354-
file.file.seek(0)
355328
with CreateArtifactVersionCM(artifact.id, db) as cm:
356-
cm.write(file.filename, file.file)
357-
artifact_version_config = {
358-
"artifact_id": str(artifact.id),
359-
"artifact_version_id": str(cm.uuid),
360-
"files": [file_config],
361-
}
362-
cm.write("config.json", json.dumps(artifact_version_config, indent=4))
329+
files = get_app().process_data_upload([file], artifact, cm)
330+
for file in files:
331+
cm.write(*file)
363332
uuid = cm.uuid
364333

365334
return crud_get_artifact_version(db, uuid)

data_copilot/backend/routers/chats.py

Lines changed: 15 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,4 @@
11
import asyncio
2-
import json
3-
import os
42
import time
53

64
from fastapi import Depends, HTTPException, routing
@@ -46,11 +44,6 @@
4644
UpdateChat,
4745
parse_chat_message,
4846
)
49-
from data_copilot.storage_handler.functions import (
50-
exists,
51-
get_signed_download_url,
52-
read_file,
53-
)
5447

5548
CONFIG = Config()
5649

@@ -293,7 +286,6 @@ async def post_chats_chatid_messages_messageid(
293286
message (Message): The message to be executed
294287
295288
"""
296-
artifact_config = {}
297289
if message.artifact_version_id:
298290
artifact_version: ArtifactVersion = await get_artifact_version_dependency(
299291
message.artifact_version_id, db=db
@@ -304,39 +296,30 @@ async def post_chats_chatid_messages_messageid(
304296
await check_if_user_has_access_to_artifact(artifact, current_user)
305297
await check_if_artifact_is_active(artifact)
306298

307-
if not exists(os.path.join(artifact_version.artifact_uri, "config.json")):
308-
raise HTTPException(
309-
status_code=400,
310-
detail="The artifact version must contain a config.json file",
311-
)
299+
artifact_version_uri = artifact_version.artifact_uri if artifact_version else None
312300

313-
artifact_config = json.load(
314-
read_file(os.path.join(artifact_version.artifact_uri, "config.json"))
315-
)
316-
file_name = artifact_config.get("files", [dict()])[0].get("file_name", None)
317-
if not exists(os.path.join(artifact_version.artifact_uri, file_name)):
318-
raise HTTPException(
319-
status_code=400,
320-
detail=f"The artifact version must contain a file named {file_name}",
321-
)
301+
previous_messages = crud_get_messages_by_chat_id_sorted_desc(
302+
db,
303+
chat_id=message.chat_id,
304+
limit=10,
305+
offset=0,
306+
)
322307

323-
sas_url = get_signed_download_url(
324-
os.path.join(artifact_version.artifact_uri, file_name)
325-
)
326-
artifact_version_id = artifact_version.id
327-
else:
328-
sas_url = None
329-
artifact_version_id = None
308+
# make json from previous messages
309+
previous_messages = [
310+
parse_chat_message(Message.from_orm(message)).dict()
311+
for message in previous_messages
312+
]
330313

331314
execution_app.send_task(
332315
"execute_user_message",
333316
args=(
334317
message.content,
335318
message.chat_id,
336319
message.id,
337-
artifact_version_id,
338-
sas_url,
339-
artifact_config,
320+
artifact_version.id,
321+
artifact_version_uri,
322+
previous_messages,
340323
),
341324
)
342325

data_copilot/celery_app/apps/getting_started_example.py

Lines changed: 0 additions & 173 deletions
This file was deleted.

0 commit comments

Comments
 (0)