facebookresearch · Nate8888 · Oct 20, 2023 · Oct 20, 2023 · Oct 20, 2023 · Oct 20, 2023
diff --git a/.github/workflows/labgraph_audiogen.yml b/.github/workflows/labgraph_audiogen.yml
@@ -0,0 +1,31 @@
+name: AudioGen Tests
+
+on: [push]
+
+jobs:
+  build:
+    runs-on: ubuntu-latest
+
+    steps:
+    - name: Checkout code
+      uses: actions/checkout@v2
+
+    - name: Setup Python
+      uses: actions/setup-python@v2
+      with:
+        python-version: '3.8'
+
+    - name: Install dependencies
+      run: |
+        cd extensions/lg_audiogen 
+        python -m pip install --upgrade pip
+        sudo apt-get install ffmpeg
+        pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
+        pip install --pre xformers
+        pip install -e .
+        pip install pytest
+
+    - name: Run tests
+      run: |
+        cd extensions/lg_audiogen
+        pytest -vvv
diff --git a/extensions/lg_audiogen/README.md b/extensions/lg_audiogen/README.md
@@ -0,0 +1,82 @@
+# Audiogen
+
+Audiogen is a Python command-line tool that uses models from Audiocraft's AudioGen to generate audio from specified descriptions. This tool can generate a single piece of audio based on a specific description, multiple pieces of audio based on a batch file containing multiple descriptions, or based on activities from a string or an `.ics` calendar file.
+
+## Features
+
+* Ability to specify duration of the generated audio.
+* Ability to generate audio based on a batch file.
+* Ability to specify the model to be used for the audio generation.
+* Ability to set the output file name.
+* Ability to generate audio based on daily activities from a comma-separated string or a `.ics` calendar file.
+* Ability to integrate with GPT models to enhance activity descriptions.
+* Ability to enable pseudo-deterministic activity prompts
+* Ability to specify a date or a range of dates to get events from the `.ics` calendar file. 
+
+## Setup
+
+Audiocraft needs Python 3.8 or higher to run. If you have a suitable version of Python installed, you can install Audiogen with pip:
+
+```shell
+pip install -e .
+```
+
+## Usage
+
+### Command-line interface
+
+The CLI usage for Audiogen is `lg_audiogen [OPTIONS] [DESCRIPTION]...`.
+
+### Options
+
+* `description`: the description based on which the audio is to be generated.
+* `duration, -d`: duration of the generated audio, default is 5.
+* `model, -m`: name of the Audiocraft AudioGen model to use, default is 'facebook/audiogen-medium'.
+* `output, -o`: name of the output file.
+* `batch`: file name for batch audio description.
+* `activities, -a`: comma-separated string or `.ics` calendar file containing events.
+* `gpt`: New: flag to enable GPT model for activities description enhancement.
+* `deterministic`: New: flag to enable deterministic generation.
+* `dates, -dt`: New: date in the format 'YYYY-MM-DD' or as a range 'YYYY-MM-DD,YYYY-MM-DD'.
+
+### Example
+
+To generate an audio file you would use the following command:
+
+```shell
+lg_audiogen -d 5 -m 'facebook/audiogen-medium' -o 'my_output' 'dog barking'
+
+lg_audiogen 'dog barking'
+
+lg_audiogen -b 'batch.txt'
+
+lg_audiogen -a 'meeting with nathan, lunch with friends' -gpt -deterministic
+
+lg_audiogen -a "calendar.ics" -gpt -dt '2023-11-29,2023-12-01'
+```
+
+**Note:** for GPT usage, create a `.env` file with the same format as the `sample.env` file provided.
+
+### Batch File Format
+
+The batch file should contain one description per line. The descriptions should be in the same format as the descriptions used in the command-line interface.
+
+Example:
+
+*batch.txt*
+```txt
+Natural sounds of a rainforest
+Bird Chirping in the background 
+```
+
+### Samples
+
+[Google Drive Folder](https://drive.google.com/drive/folders/1kdWB1CBog4NGVJ7jWddKLtBAuPm3gwDq?usp=drive_link)
+
+## O.S Support
+
+```Tested on Ubuntu 22.04 (Jammy) LTS```
+
+## Error Handling
+
+If the batch file is not found, a notable error message will be presented. Moreover, if a description is not provided when not using a batch file, a misusage error will be raised.
diff --git a/extensions/lg_audiogen/lg_audiogen/__init__.py b/extensions/lg_audiogen/lg_audiogen/__init__.py
diff --git a/extensions/lg_audiogen/lg_audiogen/calendar_reader.py b/extensions/lg_audiogen/lg_audiogen/calendar_reader.py
@@ -0,0 +1,145 @@
+from icalendar import Calendar
+from datetime import datetime, date, timedelta, timezone
+from dateutil.rrule import rrulestr
+
+MIN_YEAR = datetime.now().year
+MAX_YEAR = MIN_YEAR 
+
+def is_within_limit(dt):
+    """
+    Checks if the datetime is within the limit.
+
+    @param dt: The datetime to check.
+
+    @return: True if the datetime is within the limit, False otherwise.
+    """
+    return MIN_YEAR <= dt.year <= MAX_YEAR
+
+def convert_to_utc(dt):
+    """
+    Converts a datetime with timezone info to UTC.
+
+    @param dt: The datetime to convert.
+
+    @return: The datetime converted to UTC.
+    """
+    if isinstance(dt, datetime) and dt.tzinfo is not None and dt.tzinfo.utcoffset(dt) is not None:
+        # Convert offset-aware datetime to UTC
+        return dt.astimezone(timezone.utc)
+    return dt
+
+def datetime_to_timestamp(dt):
+    """
+    Converts a datetime or date to a timestamp.
+
+    @param dt: The datetime or date to convert.
+
+    @return: The timestamp.
+    """
+    if isinstance(dt, datetime):
+        return dt.timestamp()
+    elif isinstance(dt, date):
+        return datetime.combine(dt, datetime.min.time(), tzinfo=timezone.utc).timestamp()
+    raise TypeError("Expected datetime.datetime or datetime.date")
+
+def populate_events(start_dt, calendar_events, summary, duration):
+    """
+    Populates the calendar_events dictionary with the events.
+
+    @param start_dt: The start datetime.
+    @param calendar_events: The dictionary of events.
+    @param summary: The title/summary of the event.
+    @param duration: The duration of the event.
+
+    @return: 1 if the event was added, 0 otherwise.
+    """
+    if not is_within_limit(start_dt):
+        return 0
+
+    # Ensure dt is converted to UTC if it's a datetime with timezone info.
+    utc_start_dt = convert_to_utc(start_dt)
+    # Create timestamp from datetime or date (for sorting later)
+    timestamp = datetime_to_timestamp(utc_start_dt)
+
+    dt_str = start_dt.strftime('%Y-%m-%d') if isinstance(start_dt, date) \
+        else utc_start_dt.strftime('%Y-%m-%d')
+
+    if dt_str not in calendar_events:
+        calendar_events[dt_str] = []
+
+    event = {'name': summary, 'duration': duration, 'ts': timestamp}
+    calendar_events[dt_str].append(event)
+    return 1
+
+def populate_recurring_events(component, start_dt, calendar_events, summary, duration):
+    """
+    Populates the calendar_events dictionary with the recurring events.
+
+    @param component: The component to populate the events from.
+    @param start_dt: The start datetime.
+    @param calendar_events: The dictionary of events.
+    @param summary: The title/summary of the event.
+    @param duration: The duration of the event.
+    """
+    # rr will give us a generator
+    rr = rrulestr(component.get('rrule').to_ical().decode('utf-8'), dtstart=start_dt)
+    for dt in rr:
+        if populate_events(dt, calendar_events, summary, duration) == 0:
+            return # short circuit if we're out of the range
+
+
+def calendar_to_dictionary(filepath):
+    """
+    Given a filepath to a calendar file, returns a dictionary of events.
+
+    @param filepath: The filepath to the calendar file.
+
+    @return: A dictionary of events from the .ics file.
+    """
+    # Read the user's calendar file and parse it into an icalendar object
+    with open(filepath, 'r', encoding='utf-8') as f:
+        gcal = Calendar.from_ical(f.read())
+
+    # holds data in the format {'2023-11-06': [Event]} of the user's calendar
+    calendar_events = {}
+
+    for component in gcal.walk():
+        if component.name == "VEVENT":
+            # Extract information about the event
+            summary = str(component.get('summary'))
+            start_dt = component.get('dtstart').dt
+            end_dt = component.get('dtend').dt
+            duration = int((end_dt - start_dt).total_seconds() / 60)  # duration in minutes
+
+            # rrule Builds up the missing events that are defined by the recurring rules
+            # Ex: Meetings that happen every M, W, F
+            if 'rrule' in component:
+                populate_recurring_events(component, start_dt, calendar_events, summary, duration)
+            else:
+                populate_events(start_dt, calendar_events, summary, duration)
+
+    return calendar_events
+
+def get_events_between_dates(calendar_events, start_date_str, end_date_str):
+    """
+    Given a dictionary of events, returns the events between two dates [start_date, end_date].
+
+    @param calendar_events: The dictionary of events.
+    @param start_date_str: The start date.
+    @param end_date_str: The end date.
+
+    @return: The events between the two dates.
+    """
+    # Assumes start_date_str and end_date_str are in YYYY-MM-DD format and start_date <= end_date
+    start_date = datetime.strptime(start_date_str, '%Y-%m-%d').date()
+    end_date = datetime.strptime(end_date_str, '%Y-%m-%d').date()
+
+    events_between_dates = {}
+    current_date = start_date
+    while current_date <= end_date:
+        date_str = current_date.strftime('%Y-%m-%d')
+        if date_str in calendar_events:
+            # Sort events for the current date by timestamp key 'ts' in ascending order
+            events_between_dates[date_str] = sorted(calendar_events[date_str], key=lambda event: event['ts'])
+        current_date += timedelta(days=1)
+    return events_between_dates
diff --git a/extensions/lg_audiogen/lg_audiogen/gpt_utility.py b/extensions/lg_audiogen/lg_audiogen/gpt_utility.py
@@ -0,0 +1,71 @@
+import os
+import json
+from openai import OpenAI
+from dotenv import load_dotenv
+load_dotenv()
+
+client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
+
+def query_gpt(event_list, deterministic=False):
+    """
+    Queries GPT-3.5 to generate a response based on the given event list.
+
+    @param event_list: The list of events to be used as input.
+    @param deterministic: Flag indicating whether to use deterministic mode for GPT response generation.
+
+    @return: The response generated by GPT-3.5 as a list of strings.
+    """
+    response = client.chat.completions.create(
+        model="gpt-3.5-turbo-1106",
+        messages=[
+            {
+                "role": "system",
+                "content": "Creative assistant in generating sound prompts from a given list of events. Outputs a json object of sounds. Size of the output should be the same as the input"
+            },
+            {
+                "role": "user",
+                "content": "[\"Commute to work\", \"Walk by the beach\"]"
+            },
+            {
+                "role": "assistant",
+                "content": "{sounds: [\"Cars honking in traffic\", \"Footsteps tapping on the sand with waves in the background\"]}"
+            },
+            {
+                "role": "user",
+                "content": "[\"Virtual Meeting with Nathan\", \"Beer and Chips with Friends\"]"
+            },
+            {
+                "role": "assistant",
+                "content": "{sounds: [\"Keyboard typing and mouse clicks\", \"Laughter and the clinking of glasses, crunching of chips\"]}"
+            },
+            {
+                "role": "user",
+                "content": "[\"Meeting with Joe\"]"
+            },
+            {
+                "role": "assistant",
+                "content": "{sounds: [\"Keyboard typing and mouse clicks with chatter in the background\"]}"
+            },
+            {
+                "role": "user",
+                "content": "[\"'23.FAL.B.1 Pod Meeting - MLH Fellowship\", \"Oscar Mier and Nathan Kurelo Wilk\", \"Monday MS FinTech Classes\", \"Tuesday MS FinTech Classes\", \"23.FAL.B.1 Pod Meeting - MLH Fellowship\", \"Wednesday MS FinTech Classes\"]"
+            },
+            {
+                "role": "assistant",
+                "content": "{sounds: [\"Mic feedback, low murmur of voices discussing on a conference call\",\"Ambient room noise\",\"Turning pages, lecturer speaking faintly in the background\",\"Turning pages, lecturer speaking faintly in the background\",\"Mic feedback, low murmur of voices discussing on a conference call\",\"Turning pages, lecturer speaking faintly in the background\"]}"
+            },
+            {
+                "role": "user",
+                "content": json.dumps(event_list)
+            }
+        ],
+        temperature=0 if deterministic else 1,
+        max_tokens=1101,
+        top_p=1,
+        frequency_penalty=0,
+        presence_penalty=0,
+        response_format={ "type": "json_object" }
+    )
+    response = json.loads(response.choices[0].message.content).get("sounds")
+    print("GPT Response", response)
+    return response
diff --git a/extensions/lg_audiogen/lg_audiogen/keyword_generator.py b/extensions/lg_audiogen/lg_audiogen/keyword_generator.py
@@ -0,0 +1,50 @@
+import os
+import json
+import random
+
+# This is the default keyword dictionary. It is a JSON file that maps keywords to prompts
+# The CLI will allow the user to input his own dictionary of keywords
+THIS_DIR = os.path.dirname(os.path.abspath(__file__))
+KEYWORD_DICT =  "/static_inputs/prompt_keywords.json"
+
+# SEED for Deterministic Randomness
+DEFAULT_SEED = 42
+
+# First Try to load KEYWORD_DICT, if it doesn't work, try with THIS_DIR + KEYWORD_DICT
+try:
+    PROMPT_KEYWORDS = json.load(open(KEYWORD_DICT))
+except FileNotFoundError:
+    PROMPT_KEYWORDS = json.load(open(THIS_DIR + KEYWORD_DICT))
+except:
+    raise Exception("Could not load keyword dictionary. Please check that the file exists.")
+
+# for each word in the event name, check if it matches a keyword
+# if it does, add one of the random prompt to the list to return
+# deterministic=True will make the random choice deterministic
+def get_prompts(event_names, deterministic=False):
+    """
+    Creates a prompt for each event name by matching keywords
+    in the event name to prompts in the keyword dictionary.
+
+    @param event_names: A list of event names
+    @param deterministic: A boolean to make the random choice deterministic
+    @return: A list of prompts for each event name
+    """
+    if PROMPT_KEYWORDS and len(PROMPT_KEYWORDS) == 0:
+        raise Exception("Keyword dictionary is empty. Please check that the file is not empty.")
+    full_prompt = []
+    for event in event_names:
+        event_name = event.lower()
+        prompt = []
+        random.seed(DEFAULT_SEED if deterministic else None)
+        for word in event.split():
+            if word in PROMPT_KEYWORDS:
+                prompt.append(random.choice(PROMPT_KEYWORDS[word]))
+        if len(prompt) > 1:
+            prompt = ' combined with '.join(prompt)
+            full_prompt.append(prompt)
+        elif len(prompt) == 1:
+            full_prompt.append(prompt[0])
+        else:
+            full_prompt.append(event_name) # if no prompt is found, just use the event name
+    return full_prompt