Skip to content

Commit

Permalink
Add serverless endpoints for semantic search and summarization (#74)
Browse files Browse the repository at this point in the history
  • Loading branch information
sawyerh committed Jul 10, 2023
1 parent 1d72311 commit 53f5315
Show file tree
Hide file tree
Showing 35 changed files with 11,719 additions and 1,273 deletions.
6 changes: 3 additions & 3 deletions .vscode/settings.json
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
{
"typescript.tsdk": "node_modules/typescript/lib",
"python.analysis.extraPaths": [
"aws/search/.venv/lib/python3.10/site-packages"
],
"python.analysis.extraPaths": ["aws/ai/.venv/lib/python3.10/site-packages"],
"python.linting.mypyEnabled": false,
"python.linting.enabled": true,
"python.linting.flake8Enabled": true,
"python.linting.flake8Path": ".venv/bin/flake8",
"python.linting.flake8Args": ["--config", "setup.cfg"],
"python.formatting.provider": "black",
"[python]": {
"editor.defaultFormatter": "ms-python.black-formatter",
Expand Down
9 changes: 8 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
@@ -1,8 +1,15 @@
.DEFAULT_GOAL := init

init:
make py-init
make js-init

js-init:
npm install
poetry install

py-init:
poetry lock --no-update
poetry install --sync

py-format: # Format the code
poetry run black .
Expand Down
20 changes: 20 additions & 0 deletions aws/ai/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Distribution / packaging
.Python
env/
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
*.egg-info/
.installed.cfg
*.egg

# Serverless directories
.serverless
19 changes: 19 additions & 0 deletions aws/ai/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
init:
poetry lock --no-update
poetry install --sync

deploy:
npx serverless deploy --verbose --stage production

create-embeddings: # Bulk-create the embeddings for a Firestore export
[ -n "${OPENAI_API_KEY}" ] || (echo "OPENAI_API_KEY is not set" && exit 1)
poetry run python embeddings/bulk_create_embeddings.py

logs:
npx serverless logs --function ai --tail --stage production

test:
poetry run pytest -vv --capture=no

test-watch:
poetry run pytest-watch -v
24 changes: 24 additions & 0 deletions aws/ai/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# Serverless AI

This directory contains the code supporting features like Semantic Search and Summarization.

## Initial environment setup

1. Add a **plaintext** secret to AWS Secrets Manager with the name `Highlights/OpenAI-API-Key`. Set the OpenAI API key as the plaintext value.
1. Create the initial set of embeddings.
1. [Export the Firestore data](../../firebase/exporter/instructions.md)
1. Place the Firestore export in the `tmp` directory
1. Run `export OPENAI_API_KEY=[your key here]`
1. Run `make create-embeddings`
1. Upload the outputted `tmp/embeddings.parquet` to S3

## Resources

- Lambda management is via [Serverless Framework](https://www.serverless.com/framework/docs)
- [Powertools for AWS Lambda](https://docs.powertools.aws.dev/lambda/python/latest/) provides a toolkit to implement Serverless best practices and increase developer velocity
- [AWS SDK for pandas](https://aws-sdk-pandas.readthedocs.io/en/stable/index.html) provides a toolkit to read and write Pandas DataFrames to and from AWS data stores like S3.
- Test mocking and spying is via [`pytest-mock`](https://pytest-mock.readthedocs.io/en/latest/usage.html)

## Usage

Reference the `Makefile` for the full list of commands. More commands are available via the Serverless CLI.
48 changes: 48 additions & 0 deletions aws/ai/handler.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
from aws_lambda_powertools import Logger, Metrics, Tracer
from aws_lambda_powertools.event_handler import LambdaFunctionUrlResolver
from aws_lambda_powertools.event_handler.exceptions import BadRequestError
from aws_lambda_powertools.utilities.typing import LambdaContext
from services.search import search_highlights
from services.summarize import summarize_volume

app = LambdaFunctionUrlResolver(
debug=False,
)
logger = Logger()
metrics = Metrics()
tracer = Tracer()


@app.get("/search", compress=True)
@tracer.capture_method
def get_search():
query = app.current_event.get_query_string_value("query")
logger.info("Received search request", extra={"query": query})

if not query:
raise BadRequestError("Missing query parameter")

results = search_highlights(query)

return {"message": "Search executed successfully", "data": results}


@app.get("/summarize", compress=True)
@tracer.capture_method
def get_summarize():
volume_key = app.current_event.get_query_string_value("volume_key")
logger.info("Received summarization request", extra={"volume_key": volume_key})

if not volume_key:
raise BadRequestError("Missing volume_key parameter")

results = summarize_volume(volume_key)

return {"message": "Summarization executed successfully", "data": results}


@logger.inject_lambda_context
@metrics.log_metrics # ensures metrics are flushed upon request completion/failure
@tracer.capture_lambda_handler
def lambda_handler(event, context: LambdaContext):
return app.resolve(event, context)
Loading

1 comment on commit 53f5315

@vercel
Copy link

@vercel vercel bot commented on 53f5315 Jul 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please sign in to comment.