Staff Engineer at EDB, Gülçin Yıldırım Jelinek, knows community is everything when it comes to Postgres. Former PostgreSQL Europe Board member, advocate for women in tech, and recent podcast host, Gülçin shares her expertise at conferences and talks globally. At a recent Berlin Open Source meetup, she shared an innovative approach to extend Postgres into a vector database with pgvector to help make it easier for developers and engineers to accelerate their business.
At the Berlin event, her presentation was called “one of the best explanations of vector search I've seen”. Following the Open Source meetup we sat down with Gülçin for a Q&A to learn more about the magic of pgvector. You can read our interview below.
Attending PGConf.DE coming up on Friday, April 12th? You can catch Gülçin’s presentation live, and connect with more EDB Postgres experts sharing their insights in-person.
EDB: What makes vector databases in Postgres so important to businesses?
Gülçin: Vector databases offer a powerful solution for efficiently storing, querying, and analyzing high-dimensional data. Now that Postgres can support vectors, developers and engineers do not need to deal with complex data transfer methods, in and out of Postgres. The good part of having vector data in Postgres is you can still use your domain knowledge and use vectors to enhance your search experience.
EDB: How useful in real life is the vector similarity search? Are any of these techniques used elsewhere?
Gülçin: We all are already using recommendation systems from sources like Youtube, Netflix; everyone is a consumer of this with or without their knowledge. For example, a new startup named DBTune using AI to solve a Postgres problem of configuration tuning. There are endless ideas to implement ML techniques to solve in the Postgres domain.
EDB: How is pgvector performing? We know it supports HNSW indexes, are there ways to improve the performance of indexes?
Gülçin: The recent version of pgvector (0.6.0) added support for parallel index builds for HNSW, improving performance of HNSW index and reducing index build times. This is quite a significant improvement and makes HNSW more preferable. There are also ways to tune further by optimizing index parameters like m and ef_construction, keeping in mind the trade off between speed and recall rates. Postgres also can be tuned; indexes build faster when the graph fits into maintenance_work_mem, so this can be arranged. You can increase the number of parallel workers, and for a large number of workers you may also need to increase max_parallel_workers.
EDB: Can you help explain how to operationalize a vector extension?
Gülçin: pgvector is an extension and it allows us to perform vector similarity search. Like any other Postgres extension, we need to install it and enable it using the `CREATE EXTENSION` command. By doing this, you get a new data type called vector and new vector operations and functions that allow you to manipulate vectors directly within Postgres queries.
Then you need to determine how you want to use pgvector within your applications; this might involve modifying existing SQL queries to leverage vector operations or developing new functionality that takes advantage of the extensions’ capabilities. The main area of use for many people I talked to is like this: they want to convert their data into vectors by generating embeddings with the help of embedding models, store them in Postgres with the help of pgvector and build a simple RAG style application, such as a chatbot using company’s internal documentation let’s say. There is also a demand to enhance applications with hybrid queries using vector search and traditional keyword search. There is a lot of room for improvement in hybrid search and ensuring indexing mechanisms are up-par with these types of combined queries.
EDB: What do you see as the future of vectors? Will pgvector features eventually be part of Postgres, or will Postgres have its own way of doing similarity search?
Gülçin: If any project will be getting into Postgres, I believe it will probably be pgvector given the popularity of it within the Postgres community. Pglogical is a good an example, Postgres adapted a lot from pglogical for its logical replication support, the mechanism used is the same but terminology such as publisher/subscriber and function names were slightly different. I can see that happening with pgvector.
Interested in trying out pgvector? Try it through EDB’s cloud offering with free credits to start.