Topics Extractor

This project demonstrates how to use LangChain with Gemini (or any other LLM) to extract topics from transcripts in a cloud-based, serverless environment. The extracted topics can then be further leveraged to auto-generate hashtags for short videos, store insights in a database, or enable further content analysis.

The main concept is to automate topic generation from transcribers (speech-to-text services) where this approach commonly employs topic modeling techniques like Latent Semantic Analysis (LSA) to identify the core subjects within the content of a transcript.

You can execute the script locally by adding your Gemini API key to the .env file, there's also a Jupyter Notebook available. Optionally, you can deploy the serverless function on GCP by following the steps where you'll apply some Terraform code to create the resources (without the speech to text service) then test by upload a .txt file to the cloud bucket which will trigger the function.

If you wanted to use ChatGPT instead of Gemini, just replace the following lines in the main.py file and replace GOOGLE_API_KEY with OPENAI_API_KEY in the .env file.

# from langchain_google_genai import GoogleGenerativeAI # Comment this
from langchain_openai import OpenAI

# llm = GoogleGenerativeAI(model="gemini-pro") # Comment this
llm = OpenAI()

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
data		data
serverless		serverless
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
diagram.png		diagram.png
main.py		main.py
requirements.txt		requirements.txt
topics_extractor.ipynb		topics_extractor.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Topics Extractor

About

Uh oh!

Uh oh!

Languages

ikabbash/topics-extractor

Folders and files

Latest commit

History

Repository files navigation

Topics Extractor

About

Resources

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages