Skip to content

ikabbash/topics-extractor

Repository files navigation

Topics Extractor

This project demonstrates how to use LangChain with Gemini (or any other LLM) to extract topics from transcripts in a cloud-based, serverless environment. The extracted topics can then be further leveraged to auto-generate hashtags for short videos, store insights in a database, or enable further content analysis.

The main concept is to automate topic generation from transcribers (speech-to-text services) where this approach commonly employs topic modeling techniques like Latent Semantic Analysis (LSA) to identify the core subjects within the content of a transcript.

Cloud Function Diagram

You can execute the script locally by adding your Gemini API key to the .env file, there's also a Jupyter Notebook available. Optionally, you can deploy the serverless function on GCP by following the steps where you'll apply some Terraform code to create the resources (without the speech to text service) then test by upload a .txt file to the cloud bucket which will trigger the function.

If you wanted to use ChatGPT instead of Gemini, just replace the following lines in the main.py file and replace GOOGLE_API_KEY with OPENAI_API_KEY in the .env file.

# from langchain_google_genai import GoogleGenerativeAI # Comment this
from langchain_openai import OpenAI

# llm = GoogleGenerativeAI(model="gemini-pro") # Comment this
llm = OpenAI()

About

LangChain script that extracts topics from transcript/text with GCP serverless function.

Resources

Stars

Watchers

Forks