Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PGVector support #110

Closed
dcieslak19973 opened this issue Sep 1, 2023 · 8 comments · Fixed by #647
Closed

PGVector support #110

dcieslak19973 opened this issue Sep 1, 2023 · 8 comments · Fixed by #647

Comments

@dcieslak19973
Copy link

For larger / more diverse query histories would be nice to externalize the Vector Store to something like PostGres w/ the vector extension

@luzer
Copy link

luzer commented Oct 6, 2023

would be interested in this as well- for REDIS

@v0nerd
Copy link

v0nerd commented Oct 12, 2023

Could install pg_trgm extension and, optionally pg_similarity.

@andreped
Copy link
Contributor

This would be of interest to us as well. Any thoughts, @zainhoda? I believe you may use pgvector for your production applications?

If the pgvector vector store implementation is already accessible, it would be nice to avoid implementing it from scratch to add support. But if not, I could give it a go and draft a PR.

@navikohli
Copy link

This will a great addition to Vanna. We currently use pgvector as well for embeddings as it keeps the tech stack leaner without adding another database variant for vectors. @zainhoda

@tomthebuzz
Copy link

Shouldn’t this work out of the box, as the hosted version already uses PGVec. Just would need to ask the team on how we could enable a self-hosted PGVec instance instead of the cloud hosted instance.

@vanna Team - thanks for an excellent setup and a great starting point. Let’s take this somewhere even better…

@zainhoda
Copy link
Contributor

@tomthebuzz so generally when people want to use pgvector they want it because they already use postgres for their data. The tricky part comes in the setup (need to install pgvector, create the necessary tables, etc)

We do use pgvector for our hosted vector database and this is a snippet of what that looks like:

from sqlalchemy.orm import sessionmaker, mapped_column, declarative_base, relationship

Base = declarative_base()

class DDLEmbedding(Base):
    __tablename__ = 'ddl_embedding'
    id = Column(Integer, primary_key=True)
    created_at = Column(DateTime, default=func.current_timestamp(), server_default=text('CURRENT_TIMESTAMP'))
    updated_at = Column(DateTime, default=func.current_timestamp(), server_default=text('CURRENT_TIMESTAMP'), onupdate=func.current_timestamp())
    ddl_id = Column(Integer, ForeignKey('ddl.id'), nullable=False)
    embedding = mapped_column(Vector(1536))

    ddl = relationship("DDL")

class DocumentationEmbedding(Base):
    __tablename__ = 'documentation_embedding'
    id = Column(Integer, primary_key=True)
    created_at = Column(DateTime, default=func.current_timestamp(), server_default=text('CURRENT_TIMESTAMP'))
    updated_at = Column(DateTime, default=func.current_timestamp(), server_default=text('CURRENT_TIMESTAMP'), onupdate=func.current_timestamp())
    documentation_id = Column(Integer, ForeignKey('documentation.id'), nullable=False)
    embedding = mapped_column(Vector(1536))

    documentation = relationship("Documentation")

We can't port it over one-for-one from our hosted server because it's tightly integrated with the rest of our data model.

If someone would like to take this and run with it, we're happy to accept a PR for it

@andreped
Copy link
Contributor

andreped commented May 16, 2024

so generally when people want to use pgvector they want it because they already use postgres for their data. The tricky part comes in the setup (need to install pgvector, create the necessary tables, etc)

Yeah, thats true, but I would think there is a lot of people who want to use pgvector just as a vector store, as it is highly performant and more suited for production than say Chroma. Their documentation state that the HttpClient is the prefer client for production use (see here), but most people testing Chroma says that it is not that mature yet - then again great for prototyping and for PoCs.

We will be testing pgvector quite soon, hopefully migrate our solution to it, and if we make an implementation that makes sense for others (plug-and-play), I could potentially make a PR. Will keep you updated.

Regardless, it would be nice to discuss some implementation details, if we hit a wall, @zainhoda :]

@v0nerd
Copy link

v0nerd commented May 16, 2024

How can I use this store to implement customized RAG approach? (not to use platform services)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants