Skip to content
This repository was archived by the owner on Nov 1, 2024. It is now read-only.

AudioGen - Implemented Activities Audio Generation [DRAFT PR] #120

Open
wants to merge 58 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
58 commits
Select commit Hold shift + click to select a range
b130194
Audiocraft CLI Directory & ReadME
Nate8888 Oct 20, 2023
fa305d4
initial files
Nate8888 Oct 20, 2023
3239fb8
Adds package setup & main driver code
Nate8888 Oct 20, 2023
3abca99
Adds initial testing framework pytest
Nate8888 Oct 20, 2023
c8d6876
Test initial workflow for AudioGen
Nate8888 Oct 20, 2023
5a7e045
improved setup + added requirements
Nate8888 Oct 27, 2023
d9b6c48
Implements --description & --duration for audiogen
Nate8888 Oct 27, 2023
a6eeb30
tests the creation of the audio file with desc
Nate8888 Oct 27, 2023
091c0a4
test workflow with torch install before audiocraft
Nate8888 Oct 27, 2023
c314e4f
test workflow with different distribution of torch
Nate8888 Oct 27, 2023
6a0ce73
[workflow] - try raw torch, vision, audio
Nate8888 Oct 27, 2023
3c32fda
[workflow] - Try downgrading Python
Nate8888 Oct 27, 2023
dc2419c
[workflow] - downgrade to match audiocraft + index
Nate8888 Oct 27, 2023
15bf85e
[Workflow] adds triple verbose to pytest
Nate8888 Oct 27, 2023
3e90fce
tries self-hosted runner on Google Colab
Nate8888 Oct 27, 2023
fbed041
test only file creation
Nate8888 Oct 27, 2023
4e4f97c
Refactors code, changes argparse to @click, Docstr
Nate8888 Nov 3, 2023
3532104
Changes entry point
Nate8888 Nov 3, 2023
f4e3ca6
adds batch functionality with file input
Nate8888 Nov 3, 2023
10cc1be
Checks if file was created
Nate8888 Nov 3, 2023
730579b
linting + consistency
Nate8888 Nov 3, 2023
acb9b64
README instructions
Nate8888 Nov 3, 2023
a11e54d
Switch from labgraph_audiogen to lg_audiogen
Nate8888 Nov 17, 2023
1317661
Add versions + Improve descriptions
Nate8888 Nov 17, 2023
9d5bebf
Adds ffmpeg to fix workflow
Nate8888 Nov 17, 2023
ecb2d04
fix package name to lg_audiogen
Nate8888 Nov 17, 2023
94aee5c
Adds O.S Support on ReadME
Nate8888 Nov 17, 2023
d5e347a
Improve ReadME with samples + batch instructions
Nate8888 Nov 17, 2023
abfc11e
Add Calendar Reader Utility
Nate8888 Nov 27, 2023
3d7bdf9
Builds calendar event dictionary
Nate8888 Nov 27, 2023
0fc95f9
handles recurring events and set event limits
Nate8888 Nov 27, 2023
2d06315
Add Year limitation for non-recurring events
Nate8888 Nov 27, 2023
05b9554
Speeds up loop by breaking from rrule generator
Nate8888 Nov 27, 2023
112db66
Refactor code into functions & remove redundancies
Nate8888 Nov 27, 2023
97594dd
Functions to get the events in given dates
Nate8888 Nov 27, 2023
bd1b7ff
Adds ts to sort events to generate audio in order
Nate8888 Nov 27, 2023
ec234f0
initial keyword-based prompt generator
Nate8888 Dec 1, 2023
f9d56ee
Keyword dict to JSON, adds file load fallback
Nate8888 Dec 1, 2023
c694e4e
Adds more variety to keywords & prompts
Nate8888 Dec 1, 2023
665589e
given events, get potential prompts randomly
Nate8888 Dec 1, 2023
f4a2887
Adds the option to generate deterministic queries
Nate8888 Dec 1, 2023
6ab2189
introduces gpt functionality to generate prompts
Nate8888 Dec 1, 2023
49abd30
adds .env format for gpt generation
Nate8888 Dec 1, 2023
80e1350
adds openai module to the project
Nate8888 Dec 1, 2023
da181f2
adds icalendar module to the package setup
Nate8888 Dec 1, 2023
de9edf9
Adds cli opts for activities, gpt, date, random
Nate8888 Dec 1, 2023
2b8b288
Converting activities to sounds complete
Nate8888 Dec 1, 2023
62a15bd
adds dotenv to setup
Nate8888 Dec 1, 2023
e34726e
fix context query and code pruning
Nate8888 Dec 1, 2023
8a4782b
reappropriate keyword generator
Nate8888 Dec 1, 2023
d535a36
Fix sample prompt
Nate8888 Dec 1, 2023
3779996
adds docstrings
Nate8888 Dec 1, 2023
b73f1cd
Removes unused code, Adds docstrings
Nate8888 Dec 1, 2023
ee64dbc
Improves GPT model + context
Nate8888 Dec 1, 2023
5c11458
downgrade to cheaper model
Nate8888 Dec 1, 2023
ceaa86f
Adds Test Case to check activity functionality
Nate8888 Dec 1, 2023
af23faa
Adds ReadME with new functionalities & use cases
Nate8888 Dec 1, 2023
2297f2e
Fix activities in the ReadME
Nate8888 Dec 1, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 31 additions & 0 deletions .github/workflows/labgraph_audiogen.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
name: AudioGen Tests

on: [push]

jobs:
build:
runs-on: ubuntu-latest

steps:
- name: Checkout code
uses: actions/checkout@v2

- name: Setup Python
uses: actions/setup-python@v2
with:
python-version: '3.8'

- name: Install dependencies
run: |
cd extensions/lg_audiogen
python -m pip install --upgrade pip
sudo apt-get install ffmpeg
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
pip install --pre xformers
pip install -e .
pip install pytest

- name: Run tests
run: |
cd extensions/lg_audiogen
pytest -vvv
82 changes: 82 additions & 0 deletions extensions/lg_audiogen/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
# Audiogen

Audiogen is a Python command-line tool that uses models from Audiocraft's AudioGen to generate audio from specified descriptions. This tool can generate a single piece of audio based on a specific description, multiple pieces of audio based on a batch file containing multiple descriptions, or based on activities from a string or an `.ics` calendar file.

## Features

* Ability to specify duration of the generated audio.
* Ability to generate audio based on a batch file.
* Ability to specify the model to be used for the audio generation.
* Ability to set the output file name.
* Ability to generate audio based on daily activities from a comma-separated string or a `.ics` calendar file.
* Ability to integrate with GPT models to enhance activity descriptions.
* Ability to enable pseudo-deterministic activity prompts
* Ability to specify a date or a range of dates to get events from the `.ics` calendar file.

## Setup

Audiocraft needs Python 3.8 or higher to run. If you have a suitable version of Python installed, you can install Audiogen with pip:

```shell
pip install -e .
```

## Usage

### Command-line interface

The CLI usage for Audiogen is `lg_audiogen [OPTIONS] [DESCRIPTION]...`.

### Options

* `description`: the description based on which the audio is to be generated.
* `duration, -d`: duration of the generated audio, default is 5.
* `model, -m`: name of the Audiocraft AudioGen model to use, default is 'facebook/audiogen-medium'.
* `output, -o`: name of the output file.
* `batch`: file name for batch audio description.
* `activities, -a`: comma-separated string or `.ics` calendar file containing events.
* `gpt`: New: flag to enable GPT model for activities description enhancement.
* `deterministic`: New: flag to enable deterministic generation.
* `dates, -dt`: New: date in the format 'YYYY-MM-DD' or as a range 'YYYY-MM-DD,YYYY-MM-DD'.

### Example

To generate an audio file you would use the following command:

```shell
lg_audiogen -d 5 -m 'facebook/audiogen-medium' -o 'my_output' 'dog barking'

lg_audiogen 'dog barking'

lg_audiogen -b 'batch.txt'

lg_audiogen -a 'meeting with nathan, lunch with friends' -gpt -deterministic

lg_audiogen -a "calendar.ics" -gpt -dt '2023-11-29,2023-12-01'
```

**Note:** for GPT usage, create a `.env` file with the same format as the `sample.env` file provided.

### Batch File Format

The batch file should contain one description per line. The descriptions should be in the same format as the descriptions used in the command-line interface.

Example:

*batch.txt*
```txt
Natural sounds of a rainforest
Bird Chirping in the background
```

### Samples

[Google Drive Folder](https://drive.google.com/drive/folders/1kdWB1CBog4NGVJ7jWddKLtBAuPm3gwDq?usp=drive_link)

## O.S Support

```Tested on Ubuntu 22.04 (Jammy) LTS```

## Error Handling

If the batch file is not found, a notable error message will be presented. Moreover, if a description is not provided when not using a batch file, a misusage error will be raised.
Empty file.
145 changes: 145 additions & 0 deletions extensions/lg_audiogen/lg_audiogen/calendar_reader.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
from icalendar import Calendar
from datetime import datetime, date, timedelta, timezone
from dateutil.rrule import rrulestr

MIN_YEAR = datetime.now().year
MAX_YEAR = MIN_YEAR

def is_within_limit(dt):
"""
Checks if the datetime is within the limit.

@param dt: The datetime to check.

@return: True if the datetime is within the limit, False otherwise.
"""
return MIN_YEAR <= dt.year <= MAX_YEAR

def convert_to_utc(dt):
"""
Converts a datetime with timezone info to UTC.

@param dt: The datetime to convert.

@return: The datetime converted to UTC.
"""
if isinstance(dt, datetime) and dt.tzinfo is not None and dt.tzinfo.utcoffset(dt) is not None:
# Convert offset-aware datetime to UTC
return dt.astimezone(timezone.utc)
return dt

def datetime_to_timestamp(dt):
"""
Converts a datetime or date to a timestamp.

@param dt: The datetime or date to convert.

@return: The timestamp.
"""
if isinstance(dt, datetime):
return dt.timestamp()
elif isinstance(dt, date):
return datetime.combine(dt, datetime.min.time(), tzinfo=timezone.utc).timestamp()
raise TypeError("Expected datetime.datetime or datetime.date")

def populate_events(start_dt, calendar_events, summary, duration):
"""
Populates the calendar_events dictionary with the events.

@param start_dt: The start datetime.
@param calendar_events: The dictionary of events.
@param summary: The title/summary of the event.
@param duration: The duration of the event.

@return: 1 if the event was added, 0 otherwise.
"""
if not is_within_limit(start_dt):
return 0

# Ensure dt is converted to UTC if it's a datetime with timezone info.
utc_start_dt = convert_to_utc(start_dt)
# Create timestamp from datetime or date (for sorting later)
timestamp = datetime_to_timestamp(utc_start_dt)

dt_str = start_dt.strftime('%Y-%m-%d') if isinstance(start_dt, date) \
else utc_start_dt.strftime('%Y-%m-%d')

if dt_str not in calendar_events:
calendar_events[dt_str] = []

event = {'name': summary, 'duration': duration, 'ts': timestamp}
calendar_events[dt_str].append(event)
return 1

def populate_recurring_events(component, start_dt, calendar_events, summary, duration):
"""
Populates the calendar_events dictionary with the recurring events.

@param component: The component to populate the events from.
@param start_dt: The start datetime.
@param calendar_events: The dictionary of events.
@param summary: The title/summary of the event.
@param duration: The duration of the event.
"""
# rr will give us a generator
rr = rrulestr(component.get('rrule').to_ical().decode('utf-8'), dtstart=start_dt)
for dt in rr:
if populate_events(dt, calendar_events, summary, duration) == 0:
return # short circuit if we're out of the range


def calendar_to_dictionary(filepath):
"""
Given a filepath to a calendar file, returns a dictionary of events.

@param filepath: The filepath to the calendar file.

@return: A dictionary of events from the .ics file.
"""
# Read the user's calendar file and parse it into an icalendar object
with open(filepath, 'r', encoding='utf-8') as f:
gcal = Calendar.from_ical(f.read())

# holds data in the format {'2023-11-06': [Event]} of the user's calendar
calendar_events = {}

for component in gcal.walk():
if component.name == "VEVENT":
# Extract information about the event
summary = str(component.get('summary'))
start_dt = component.get('dtstart').dt
end_dt = component.get('dtend').dt
duration = int((end_dt - start_dt).total_seconds() / 60) # duration in minutes

# rrule Builds up the missing events that are defined by the recurring rules
# Ex: Meetings that happen every M, W, F
if 'rrule' in component:
populate_recurring_events(component, start_dt, calendar_events, summary, duration)
else:
populate_events(start_dt, calendar_events, summary, duration)

return calendar_events

def get_events_between_dates(calendar_events, start_date_str, end_date_str):
"""
Given a dictionary of events, returns the events between two dates [start_date, end_date].

@param calendar_events: The dictionary of events.
@param start_date_str: The start date.
@param end_date_str: The end date.

@return: The events between the two dates.
"""
# Assumes start_date_str and end_date_str are in YYYY-MM-DD format and start_date <= end_date
start_date = datetime.strptime(start_date_str, '%Y-%m-%d').date()
end_date = datetime.strptime(end_date_str, '%Y-%m-%d').date()

events_between_dates = {}
current_date = start_date
while current_date <= end_date:
date_str = current_date.strftime('%Y-%m-%d')
if date_str in calendar_events:
# Sort events for the current date by timestamp key 'ts' in ascending order
events_between_dates[date_str] = sorted(calendar_events[date_str], key=lambda event: event['ts'])
current_date += timedelta(days=1)
return events_between_dates
71 changes: 71 additions & 0 deletions extensions/lg_audiogen/lg_audiogen/gpt_utility.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
import os
import json
from openai import OpenAI
from dotenv import load_dotenv
load_dotenv()

client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

def query_gpt(event_list, deterministic=False):
"""
Queries GPT-3.5 to generate a response based on the given event list.

@param event_list: The list of events to be used as input.
@param deterministic: Flag indicating whether to use deterministic mode for GPT response generation.

@return: The response generated by GPT-3.5 as a list of strings.
"""
response = client.chat.completions.create(
model="gpt-3.5-turbo-1106",
messages=[
{
"role": "system",
"content": "Creative assistant in generating sound prompts from a given list of events. Outputs a json object of sounds. Size of the output should be the same as the input"
},
{
"role": "user",
"content": "[\"Commute to work\", \"Walk by the beach\"]"
},
{
"role": "assistant",
"content": "{sounds: [\"Cars honking in traffic\", \"Footsteps tapping on the sand with waves in the background\"]}"
},
{
"role": "user",
"content": "[\"Virtual Meeting with Nathan\", \"Beer and Chips with Friends\"]"
},
{
"role": "assistant",
"content": "{sounds: [\"Keyboard typing and mouse clicks\", \"Laughter and the clinking of glasses, crunching of chips\"]}"
},
{
"role": "user",
"content": "[\"Meeting with Joe\"]"
},
{
"role": "assistant",
"content": "{sounds: [\"Keyboard typing and mouse clicks with chatter in the background\"]}"
},
{
"role": "user",
"content": "[\"'23.FAL.B.1 Pod Meeting - MLH Fellowship\", \"Oscar Mier and Nathan Kurelo Wilk\", \"Monday MS FinTech Classes\", \"Tuesday MS FinTech Classes\", \"23.FAL.B.1 Pod Meeting - MLH Fellowship\", \"Wednesday MS FinTech Classes\"]"
},
{
"role": "assistant",
"content": "{sounds: [\"Mic feedback, low murmur of voices discussing on a conference call\",\"Ambient room noise\",\"Turning pages, lecturer speaking faintly in the background\",\"Turning pages, lecturer speaking faintly in the background\",\"Mic feedback, low murmur of voices discussing on a conference call\",\"Turning pages, lecturer speaking faintly in the background\"]}"
},
{
"role": "user",
"content": json.dumps(event_list)
}
],
temperature=0 if deterministic else 1,
max_tokens=1101,
top_p=1,
frequency_penalty=0,
presence_penalty=0,
response_format={ "type": "json_object" }
)
response = json.loads(response.choices[0].message.content).get("sounds")
print("GPT Response", response)
return response
50 changes: 50 additions & 0 deletions extensions/lg_audiogen/lg_audiogen/keyword_generator.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
import os
import json
import random

# This is the default keyword dictionary. It is a JSON file that maps keywords to prompts
# The CLI will allow the user to input his own dictionary of keywords
THIS_DIR = os.path.dirname(os.path.abspath(__file__))
KEYWORD_DICT = "/static_inputs/prompt_keywords.json"

# SEED for Deterministic Randomness
DEFAULT_SEED = 42

# First Try to load KEYWORD_DICT, if it doesn't work, try with THIS_DIR + KEYWORD_DICT
try:
PROMPT_KEYWORDS = json.load(open(KEYWORD_DICT))
except FileNotFoundError:
PROMPT_KEYWORDS = json.load(open(THIS_DIR + KEYWORD_DICT))
except:
raise Exception("Could not load keyword dictionary. Please check that the file exists.")

# for each word in the event name, check if it matches a keyword
# if it does, add one of the random prompt to the list to return
# deterministic=True will make the random choice deterministic
def get_prompts(event_names, deterministic=False):
"""
Creates a prompt for each event name by matching keywords
in the event name to prompts in the keyword dictionary.

@param event_names: A list of event names
@param deterministic: A boolean to make the random choice deterministic
@return: A list of prompts for each event name
"""
if PROMPT_KEYWORDS and len(PROMPT_KEYWORDS) == 0:
raise Exception("Keyword dictionary is empty. Please check that the file is not empty.")
full_prompt = []
for event in event_names:
event_name = event.lower()
prompt = []
random.seed(DEFAULT_SEED if deterministic else None)
for word in event.split():
if word in PROMPT_KEYWORDS:
prompt.append(random.choice(PROMPT_KEYWORDS[word]))
if len(prompt) > 1:
prompt = ' combined with '.join(prompt)
full_prompt.append(prompt)
elif len(prompt) == 1:
full_prompt.append(prompt[0])
else:
full_prompt.append(event_name) # if no prompt is found, just use the event name
return full_prompt
Loading